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Abstract 

Most existing natural language database interfaces ('nldbs) were designed to be used 
with database systems that provide very limited facilities for manipulating time-dependent 
data, and they do not support adequately temporal linguistic mechanisms (verb tenses, 
temporal adverbials, temporal subordinate clauses, etc.). The database community is 
becoming increasingly interested in temporal database systems, that are intended to store 
and manipulate in a principled manner information not only about the present, but also 
about the past and future. When interfacing to temporal databases, supporting temporal 
linguistic mechanisms becomes crucial. 

We present a framework for constructing natural language interfaces for temporal 
databases ("nltdbsJ, that draws on research in tense and aspect theories, temporal logics, 
and temporal databases. The framework consists of a temporal intermediate represen- 
tation language, called top, an HPSG grammar that maps a wide range of questions 
involving temporal mechanisms to appropriate top expressions, and a provably correct 
method for translating from top to TSQL2, TSQL2 being a recently proposed temporal 
extension of the SQL database language. This framework was employed to implement a 
prototype nltdb. 



1 Introduction 

Since the 1960s, natural language database interfaces (nldbs) have been the subject of much 
attention in the natural language processing community E7|], |15[| , Q. nldbs allow users to 
access information stored in databases by formulating requests in natural language. Most 

'This paper reports on work that was carried out while the first author was in the Department of Artificial 
Intelligence, University of Edinburgh, supported by the Greek State Scholarships Foundation. 
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existing nldbs were designed to interface to database systems that provide very limited 
facilities for manipulating time-dependent data. Consequently, most nldbs also provide very 
limited temporal support. In particular, they are intended to answer questions that make 
only a superficial use of time, in the form of timestamp values that are just treated as a 
form of numerical data by the database language, and do not adequately support temporal 
linguistic mechanisms (verb tenses and aspects, temporal adverbials, temporal subordinate 
clauses, etc.): users are allowed to use very few (if any) of these mechanisms, and their 
semantics are typically over-simplified or even ignored. 

The database community is becoming increasingly interested in temporal database sys- 
tems. These are intended to store and manipulate in a principled manner information not 



only about the present, but also about the past and future [55|, [56]. In questions directed 
to temporal databases (e.g. ([]]) - (|3])), it becomes crucial for nldbs to interpret correctly 
temporal linguistic expressions. 

(1) What was the salary of each engineer while ScotCorp was building bridge 5? 

(2) Did anybody leave site 4 before the chief engineer had inspected the control room? 

(3) Which systems did the chief engineer inspect on Monday after the auxiliary generator 
was in operation? 

Temporal database systems currently exist only in the form of research prototypes (e.g. 
||), but we expect that commercially available temporal database systems will appear in the 
near future, because: (i) data storage is constantly becoming cheaper, and hence retaining 
huge quantities of information about the past is becoming more cost effective, (ii) there is a 
strong demand for tools to store and manipulate time-dependent data (e.g. in stock market 
and data mining applications), and (iii) research on temporal databases is maturing, to the 



extent that consensus proposals are beginning to appear [33j], [54]]. It is, therefore, desirable 
to investigate how interfaces to temporal database systems can be built. 

Typically, nldbs have used an intermediate representation language (usually some form 
of logic) to encode the meanings of natural language requests, with the resulting intermediate 
language expressions being available for translation into a suitable database language, often 
SQL |3Sf] . (The benefits of this include generality, modularity, portability Q.) Building on this 
architecture, we propose a framework for constructing nldbs for temporal databases (nltdbs) 
that draws on ideas from tense and aspect theories (see [ |l3| , [l4|] for an introduction), temporal 
logics [58|] , [29|] , and temporal databases. We emphasise that we do not set out to formulate an 
improved general tense and aspect theory or temporal logic. Our aim is a more practical one: 
to explore how ideas from these areas can be integrated in a framework that leads to practical 
nltdbs. We also concentrate on keyboard-entered English questions that refer to the past 
or the present, ignoring the following more difficult issues: questions about the future (that 
may require more complex semantic models), speech input (although our work is in principle 
compatible with the text having been spoken), and requests to update the database (see [l8|] 
for an outline of the difficulties with updates) . 

Ignoring some details, our framework consists of a formal temporal intermediate repre- 
sentation language, called top, an HPSG grammar [49 that maps a wide range of English 



questions involving temporal mechanisms to appropriate top expressions, and a provably 
correct mapping to translate from TOP to TSQL2, TSQL2 being a recently proposed extension 
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Figure 1: Architecture of the prototype NLTDB 



of SQL |^4 |. To demonstrate that this framework is workable, we have employed it to imple- 
ment a small prototype nltdb, using ale ||, ||] and Prolog^] Figure [l] shows the architecture 
of that system. 

Each English question is first parsed using the HPSG grammar, generating an HPSG sign 
(a feature-structure that represents syntactic and semantic properties of the question; signs 
and the components of the HPSG grammar will be discussed in section |]). Multiple signs are 
generated for questions that the parser understands to be ambiguous. A TOP formula is then 
extracted from each sign and the extracted formula undergoes an additional post-processing 
phase (to be discussed in section |4.8| ) . 

The top formulae that are generated at the end of the post-processing capture what 
the NLTDB understands to be the possible readings of the English question. (The prototype 
NLTDB makes no attempt to determine which reading the user has in mind.) These formulae 
are then translated into TSQL2. (The h! functions of figure [l] map primitive TOP expressions 
to TSQL2 expressions. They will be discussed in section |6.1| .) 

The prototype NLTDB prints all the resulting TSQL2 queries and the corresponding top 
formulae. 

The generated TSQL2 queries would be executed by the underlying database management 
system (dbms) to retrieve the information requested by the user. A prototype database 
system, called timedb, that supports a version of TSQL2 now exists.0 During the work that 
we report here, however, timedb was not yet available. As a result of this, our prototype 

x The prototype nltdb is available from http://www.dai.ed.ac.uk/groups/nlp. 

2 timedb was developed by Andreas Steiner at ETH Zurich. It is available from http://www.cs.auc.dk/ 
general/DBS/tdb/TimeCenter. timedb compiles TSQL2 statements into sequences of conventional SQL state- 
ments, which are then executed by a commercial database system (oracle). 
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NLTDB has not been linked to timedb. Hence, the TSQL2 queries are currently not executed, 
and no answers are produced. We hope that future work will link the two systems. This task 
is complicated by the fact that the two systems adopt different versions of TSQL2, since we 
had to introduce some modifications to TSQL2 (see section |5|), and timedb also introduces 
some modifications. 

The lexicon entries and sort hierarchy of the grammar, and the definitions of the h! 
functions of the top to TSQL2 translator are the only parts of the prototype nltdb that need 
to be revised when configuring the nltdb for a new database. 

The parser, the grammar, and the formula extractor were implemented using ale ]8], |9| . 
ALE provides a chart-parser (which is the one used in the prototype nltdb), and a formalism 
for expressing typed feature-structures, Prolog-like rules, and unification-based grammars.^ 
The post-processor and the top to TSQL2 translator were implemented in Prolog. 

We emphasise that the prototype nltdb is intended only to demonstrate that our theoret- 
ical framework is implementable. Several modules would have to be added to the prototype 
NLTDB, if it were to be used in real- life applications (see section ||). 

The remainder of this paper is organised as follows: section |2| discusses some ideas from 
tense and aspect theories that are used in our framework; section |3| describes top, our inter- 
mediate representation language; section || presents the HPSG-based mapping from English to 
TOP; section proffers a brief introduction to TSQL2; section ^ highlights the TOP to TSQL2 map- 
ping; section compares our approach to previous proposals on nltdbs; section |8| concludes 
and proposes directions for further research. 

Although several issues (summarised in section ||) remain to be addressed, we argue that 
our framework constitutes a significant improvement over previous proposals on nltdbs. 



2 Aspectual classes 



It is common in tense and aspect theories to classify natural language expressions or situations 
described by natural language expressions into aspectual classes (or "aktionsarten" ) , a prac- 
tice often attributed to |5S|. Numerous aspectual taxonomies have been proposed (see, for 



example, [f25|, 4lq| , [43|, [|4j], |60[). These differ in the number and names of the classes, and 
the nature of the classified objects. What is common to all of them is that they try to capture 
intuitions about the way that native speakers view situations, with respect to issues such as 
whether the situation is instantaneous or more drawn out, whether the situation involves an 
event which is completed or not, etc. Although most of this work has been motivated by 
the goal of capturing linguistic generalisations, these nuances of meaning are also relevant to 
ensuring that a nltdb operates in a correct and comprehensible manner. 



2.1 Our aspectual taxonomy 

For our purposes, we have found a taxonomy that distinguishes between states, points, cul- 
minating activities, and activities to be adequate. Our four classes correspond to ways of 
viewing situations that people seem to use when communicating in English: a situation can 
be viewed as involving no action (state view), as an instantaneous action (point view), as 

3 The ALE grammar of the prototype nltdb is based on previous ALE encodings of HPSG grammars by Penn 
and Carpenter at Carnegie Mellon University, and Manandhar and Grover at the University of Edinburgh. We 
are grateful to all four of them. ALE is available from http://macduff.andrew.cmu.edu/ale. 
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an action with a climax (an inherent completion point; culminating activity view), or as an 
action with no climax (activity view). (The use of "culminating activities" is derived from 
p2|.) For example, "Tank 5 contains oil." is typically uttered with a state view in mind, 
while "Tank 5 exploded." would be used in most contexts with a point view. "ScotCorp built 
a new bridge." would be typically uttered with a culminating activity view. In this case, the 
building situation is seen as having an inherent completion point (the point where the entire 
bridge has been built). If the building stops before reaching this point, the building situation 
is considered incomplete. When a verb is used in the simple past in a sentence uttered with a 
culminating activity view, the speaker usually implies that the climax was reached (the build- 
ing of the bridge was completed). In contrast, "Robot 2 moved." would probably be uttered 
with an activity view. In this case, there is no climax (the movement can stop at any point 
without being any more or less complete), and the simple past carries no implication that 
any climax was reached. If, however, a particular destination were provided, as in "Robot 2 
moved to the exit.", the sentence would be uttered with a culminating activity view, and the 
climax would be the point where the robot reaches the exit. 

Determining which view the speaker has in mind is important to understand what the 
speaker means. For example, when an "at ... " temporal adverbial is attached to a clause 
uttered with a state view (e.g. "Tank 5 contained oil at 5:00pm."), the speaker usually means 
that the situation of the clause simply holds at the adverbial's time. Excluding special contexts 
that do not arise in NLDBs, there is normally no implication that the situation starts or stops 
holding at the adverbial's time. In contrast, with clauses uttered with culminating activity 
views, "at ... " adverbials (if felicitous) usually specify the time where the situation starts 
or reaches its climax (e.g. (|j)). 

(4) Robot 2 moved to the exit at 5:00pm. 

Some linguistic markers seem to signal which view the speaker has in mind. For example, the 
progressive usually signals a state view (e.g. unlike (Q) , (|5|) is typically uttered with a state 
view, and the movement is simply ongoing at 5:00pm). 

(5) Robot 2 was moving to the exit at 5:00pm. 

Often, however, there are no such explicit markers. The processes employed in those cases by 
hearers to determine the speaker's view are not yet fully understood. In a nltdb, however, 
where questions refer to a restricted domain, reasonable guesses can be made by observing that 
in each domain, each verb tends to be associated mainly with one particular view. Certain 
agreements about how situations are to be viewed (e.g. that some situations are to be treated 
as instantaneous - point view) will have also been made during the design of the database. 
These agreements provide additional clues about the main view of each verb in the domain. 

More precisely, we adopt the following approach. Whenever the nltdb is configured for 
a new application domain, the base form of each verb is assigned to one of the four aspectual 
classes, using criteria (to be discussed below) intended to determine the main view of each verb 
in the particular domain. Aspectual class is treated as a property not only of verbs, but also 
of verb phrases, clauses, and sentences. Normally, all verb forms inherit the aspectual classes 
of the corresponding base forms. Verb phrases, clauses, and sentences normally inherit the 
aspectual classes of their main verb forms. Some linguistic mechanisms (e.g. the progressive), 
however, may cause the aspectual class of a verb to differ from that of the base form, or the 
aspectual class of a verb phrase, clause, or sentence to differ from that of its main verb form. 
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In the case of a verb like "to run", that typically involves a culminating activity view 
when used with an expression determining a specific destination or distance (e.g. "to run 
to the station/Eve miles"), but an activity view otherwise, we assume that there are two 
different homonymous verbs "to run": one has a culminating activity base form, and requires 
a complement determining a specific destination or distance; the other has an activity base 
form and requires no such complement. A similar distinction is introduced with verbs whose 
aspectual class depends on whether the verb's object denotes a countable or mass entity (e.g. 
"to drink a bottle of wine" vs. "to drink wine") |^|. Similarly, when a verb can be used in an 
application domain with both habitual and non-habitual meanings (e.g. "BA737 departs from 
Gatwick." (habitually, perhaps every day) vs. "BA737 (actually) departed from Gatwick five 
minutes ago."), we distinguish between homonyms with habitual and non-habitual meanings. 
Our aspectual criteria classify the base forms of habitual homonyms as states. The aspectual 
classes of non-habitual homonyms depend on the verb and the domain. 

In the rest of this paper, we call verbs whose base forms are classified as states, activi- 
ties, culminating activities, and points, state, activity, culminating activity, and point verbs 
respectively. This does not rule out other (non-base) forms of these verbs belonging to other 
classes. 



2.2 Aspectual criteria 

It is important to realise that our aspectual classes play a purely formal role, in the sense 
that their whole purpose is to allow the systematic translation of English into our semantic 
representation, in a way that achieves plausible effects within a nltdb. Despite our attempts, 



in section 2.1 above, to provide an informal meaning for our classes, they are not accessible 
to linguistic intuition. Verbs are located in the classification scheme using particular criteria, 
which we outline below, not by considering whether or not they appear to meet some semantic 
condition suggested by the class name. For example, we have found it convenient to group 
"habitual" usage with "states" , but this does not embody an empirical claim that "habitual" 
sentences do not describe actions. 

The criteria we use to classify the base verb forms in each application domain are three, 
and they are applied sequentially. First, the simple present criterion is used: if the simple 
present of a verb can be used in the particular domain in single-clause questions with non- 
futurate meanings, it is a state verb; otherwise, it is a point, activity, or culminating activity 
verb. For example, in domains where (|^) is possible, "to contain" is a state verb. 

(6) Does any tank contain oil? 

The simple present is sometimes used to refer to scheduled situations. For example, (^) could 
refer to a particular scheduled assembling. We consider this meaning of (^) a futurate one. 
Hence, (^) does not constitute evidence that "to assemble" is a state verb. 

(7) When does J.Adams assemble engine 5? 



As explained in section 2.1, when verbs have both habitual and non-habitual uses, we distin- 
guish between habitual and non-habitual homonyms. Excluding scheduled meanings, (||) can 
only have a habitual meaning, i.e. it can only involve the habitual homonym of "to land". 
Therefore, in domains where @ is possible, the habitual "to land" is a state verb. @ does 
not constitute evidence that the non-habitual "to land" is a state verb. 
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(8) Which flight lands on runway 2? 

Next, the point criterion is applied to distinguish point verbs from activity or culminating 
activity verbs. The point criterion is based on the fact that some verbs describe kinds of 
situations that are always modelled in the database as instantaneous. If a verb describes 
situations of this kind, it is a point verb; otherwise it is an activity or culminating activity 
verb. Many of our illustrative examples come from a hypothetical airport domain (described 
in Q). In that domain, we assume the database does not distinguish between the times at 
which a flight starts or stops entering an airspace sector: entering a sector is modelled as 
instantaneous. Also, in our airport domain "to enter" is used only to refer to flights entering 
sectors. Consequently, in that domain "to enter" is a point verb. (If "to enter" were also used 
to refer to groups of passengers entering planes, and if situations of this kind were modelled 
as non- instantaneous, one would have to distinguish two homonymous "to enter": one used 
with flights entering sectors, and one with groups of passengers entering planes. Only the the 
first homonym would be a point verb.) 

Finally, the imperfective paradox criterion is applied to decide if the remaining verbs are 
activity or culminating activity ones. The criterion is based on the "imperfective paradox" 
fl24j , []38|| . Yes/no questions containing the past continuous and the simple past of the verbs 
(e.g. @ - (|I2"D) are considered. (The past continuous questions must not be read with futurate 
meanings, e.g. (^) must not be taken to ask if John was going to run.) If an affirmative answer 
to the past continuous question implies an affirmative answer to the simple past question (as 
m @ - ©), the verb is an activity verb; otherwise (as in ( |ll| ) ~~ (|l2|)), it is a culminating 



activity verb. ((|Tl|) does not imply (p^): John may have abandoned the building before 
completing the house.) 

(9) Was John running? 

(10) Did John run? 

(11) Was John building a house? 

(12) Did John build a house? 

We show below how some base verb forms are classified in our airport domain. 

states: contain, be (non-auxiliary), arrive (habitually), depart (habitually), service (habitu- 
ally) 

activities: circle (waiting permission to land), taxi (no destination), queue (for runway) 

culminating activities: land, take off, inspect, board, service (actually), taxi (to destina- 
tion) 

points: enter, start, begin, stop, finish, leave, arrive (actually), depart (actually) 

We stress that the classification depends on the uses of the verbs in the particular domain, 
and on how the situations of the verbs are modelled in the database. For example, our 
hypothetical airport database models arrivals and departures as instantaneous, and hence the 
corresponding verbs are classified as point verbs. If arrivals and departures were not modelled 
as instantaneous, the corresponding verbs would probably be classified as culminating activity 
ones. 



7 



3 The TOP language 



We now turn to top, our intermediate representation language. Although we avoid referring 
to it as a "logic" (on the grounds that it has no proof theory), it is in many respects similar to, 
and closely based on, traditional logics, and it has a model theory much as one would define 
for a genuine temporal logic. However, it is far from being a full logic, as the emphasis in 
its design has been on reflecting natural language semantics rather than covering inferential 
phenomena. A formal summary of TOP is given in the appendix; although this is rather terse 
owing to space limitations, a fuller account can be found in chapter 3 of [§]. 

top contains constants and variables, and atomic formulae are constructed by applying 
predicate symbols to arguments. Formulae can be conjoined, but there are currently no 
negation or disjunction connectives. (We did not need these for the linguistic phenomena that 
we focused on. We plan to include these in future TOP versions.) Similarly, there are currently 
no quantifier symbols, with variables essentially being treated as existentially quantified. The 
most relevant constructs for the current discussion are the temporal operators (which were 



influenced by those of |16| ). For example, (|13|) is represented as (|14[). Roughly speaking, 
the Past operator (introduced by the verb tense) requires contain(tk2, water) to be true at 
some past time e v (the suffix marks variables), and the At operator (introduced by the 
adverbial) requires that time to fall within 1/10/95. (Dates are shown in the day /month/year 
format.) 

(13) Did tank 2 contain water (some time) on 1/10/95? 

(14) At[l/ 10/95 , Past[e v , contain(tk2, water)]} 

As we are not primarily interested in the structure of referring expressions, we will make the 
simplifying assumption that particular items in the domain such as "tank 2" are associated 
with simple constants such as tk2 rather than some more complex logical expression. The 
prototype nltdb currently requires "tank 2" to be typed as a single word, which is associated 
in the lexicon with tk2. Notice that tk2 is a top constant, not a database value. 

3.1 TOP basics 

top assumes that time is linear, discrete, and bounded [p8fl. Following the Reichenbachian 



tradition [51], top formulae are evaluated with respect to more than one times: st (speech 
time, the time where the question is submitted to the nltdb), et (event time, the time where 
the situation denoted by the formula holds), and It (localisation time, a temporal window 
within which the event time must be placed (cf. Jig] ; this is different from Reichenbach's 
"reference time" and similar to the "location time" of P4||). While st is always a time-point, 
et and It are generally periods, and hence TOP can be considered period-based. (To avoid 
conflicts with TSQL2 terminology, we use the term "period" to refer to what temporal logicians 
usually call "intervals", i.e. convex sets of time-points.) 

Although the aspectual classes of natural language expressions affect how these expressions 
are represented in TOP, it is not always possible to tell the aspectual class of a natural language 
expression by examining its TOP representation. The approach here is different from those 



of [24 1, H25| l, [pq [, and |38|| , where aspectual class is a property of formulae or denotations of 
formulae. 

Returning to (|i~3|), the answer will be affirmative if (|i~4| ) evaluates to true. When evalu- 
ating top formulae, It initially covers the entire time axis, though it may be narrowed down 
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subsequently by temporal operators. In (14), the At operator narrows It to its intersection 
with 1/10/95. This causes It to become 1/10/95. The Past operator then narrows It to its 
intersection with [tfi rs t, st), where i/j rs t is the earliest time-point. (In this particular example, 
if 1/10/95 is entirely in the past, the Past operator does not restrict It any further.) The 
overall formula evaluates to true iff there is an event time period et, where contain(tk2, water) 
is true, such that et C It. (pi is a subperiod of p2, written p\ C p2, iff Pi,P2 are periods and 
Pi Q P2-) 

Although here we provide only informal descriptions of the semantics of top formulae, 
TOP has formally defined model-theoretic semantics (see Appendix [A] and [0]). Among other 
things, a top model (which is ultimately defined in terms of database constructs), specifies 
the maximal event time periods where the situations denoted by top predicates hold (e.g. 
in the case of contain(tk2, water), it specifies the maximal periods where tank 2 contained 
water), top predicates are always homogeneous. This means that if a predicate is true at an 
event time et, it is also true at any event time et' C et. (Various versions of homogeneity have 



been used in |l|, |36j], |38|], |52[, and elsewhere.) In our example, if tank 2 contained water 
from 29/9/95 to 2/10/95, contain(tk2, water) is true at any event time that is a subperiod 
of that period. Hence, there will be an et where contain(tk2, water) is true, such that et is 
also a subperiod of 1/10/95 (the It), and the answer to (^) will be affirmative. 

3.2 Interrogative quantifiers 



In (14), the semantics of the Past operator bind the variable e v to et (the event time, where 
tank 2 contained water). The e v argument of the Past operator is useful, for example, in 
time-asking questions like (|i~5|), expressed as (|l~6|). The interrogative quantifier (l mx i) of ( |l6|) 
reports the maximal past ets where contain(tk2, water) is true, such that et is a subperiod 
of 1/10/95. 

(15) When on 1/10/95 was tank 2 empty? 

(16) ? mx ie v At[l/10/95,Past[e v ,empty(tk2)}} 

A simpler version of the interrogative quantifier is used in non-time asking questions like 
(|j~7|), expressed as (fl8|). 



(17) What did tank 2 contain on 1/10/95 at 5:00pm? 

(18) ?w v Part[5-00pm9,f v ]AAt[l/10/95,At[f v ,Past[e v ,contain(tk2,w v )]]] 



The 5.00pm 9 in @ denotes the set of all 5:00pm-periods (5:00:00-59 on 2/2/95, 5:00:00-59 
on 3/2/95, etc.). The " 9 " suffix stands for "gappy partitioning": the set of all 5:00pm-periods 
is gappy, in the sense that the union of its elements does not cover the entire time-axis. 
Part[5 -00pm 9 , f v ] means that f v must be a 5:00pm-period. The two At operators cause It to 
become the particular 5:00pm-period of 1/10/95. e v , the event time where contain(tk2,w v ) 
holds, must fall in the past and within 

3.3 More on At, Before, and After 

Two additional operators, Before and After, are used to express adverbials like "before 
1/10/95" or "after 5:00pm". Like At, these operators narrow It. The first arguments of 



9 



At, Before, and After can also be formulae. This form of the operators is used when ex- 
pressing subordinate clauses introduced by "while", "before", or "after" respectively. In (p0|), 
which represents one possible reading of (19), the At operator causes Past[e2 v , free(run2)] 



to be evaluated with respect to a new It'. It' is the intersection of the original It (the whole 
time-axis) with a maximal event time period eV where Past [el v , circling (6a 737)] is true. ( p0| ) 
evaluates to true iff there is a past event time e2 v IZ It' where free(run2) is true. 

(19) Was runway 2 free while BA737 was circling? 

(20) At[Past[el v ,circling(ba737)], Past[e2 v , free(run2)]] 

3.4 The Fills operator 



(|20| ) represents the reading of (|l9|) whereby runway 2 must have been free some time while 
BA737 was circling. There is, however, a stricter reading, that requires runway 2 to have been 
free throughout BA737's circling. (Some speakers report another, preferred, reading in which 
the runway had to be free for most of the time that BA737 was circling. As a simplification, we 
will not handle that dialect variant, which would be semantically complicated.) Readings of 
this kind are very common when states combine with adverbials or clauses that are understood 
as specifying non-instantaneous periods. For example, (^Tj) has a reading where tank 2 must 
have been empty some time on 1/10/95, and a stricter (and probably preferred) one where 
the tank must have been empty throughout that day. 

(21) Was tank 2 empty on 1/10/95? 

For simplicity, the stricter readings are currently ignored in our English to top mapping (and 
the rest of this paper). These readings, however, can be expressed in top using the Fills 
operator, which requires the event time to cover exactly the localisation time. For example, 



the stricter reading of ( |19| ) can be expressed as (22). The Fills operator of (p2j) requires e2 v 
(event time where runway 2 was free) to be set to It' (maximal past period where BA737 was 
circling). The stricter reading of (|2l|) is expressed as (E3). 



(22) At[Past[el v , circling (ba737)], Past[e2 v , Fills[free(run2)}}} 

(23) At[l/10/95,Past[e v ,Fills[empty(tk2)]]] 

3.5 The Culm operator 

In the case of activity, state, or point verbs (e.g. "to circle", "to contain", and "to depart" 
in our airport domain), progressive and non-progressive forms receive the same top repre- 
sentations. For example, both (p3) and (p5|) are represented as (26), which requires an et 



where BA737 was circling to exist in the past. (Progressive forms of state verbs are typically 
unfelicitous, but for our purposes it is harmless to allow them. The use of point verbs in the 



progressive, as in (27), usually signals that the user is not aware that the situation of the verb 
is modelled as instantaneous. A warning to the user should be generated in the latter case. 
Our prototype nltdb, however, currently does not generate such messages.) 

(24) Was BA737 circling? 

(25) Did BA737 circle? 
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(26) Past[e v ,circling(ba737)] 



(27) Was BA737 departing at 5:00pm? 

With culminating activity verbs (e.g. "to inspect" in the airport domain), the representation 
of the non-progressive contains an additional Culm operator. ( p8|) and (^0|) are represented 
as (p9|) and (|3l]) respectively. The Culm operator requires et to begin at the start-point of 
the earliest maximal period where the situation of its argument holds (the argument of Culm 
is always an atomic formula), and to end at the end-point of the latest of those maximal 
periods. Furthermore, the situation of the predicate must reach its climax at the end-point 
of et. 

(28) Was J.Adams inspecting BA737? 

(29) Past[e v ,inspecting(ja, 6a737)] 

(30) Did J.Adams inspect BA737? 

(31) Past[e v , Culm[inspecting(ja,ba737)]] 

Let us assume, for example, that J.Adams started to inspect BA737 at 9:00pm on 1/10/95, 
stopped the inspection at 1:00pm, resumed it at 2:00pm, and finished it at 4:00pm on the 
same day. The maximal periods where the situation of inspecting^] 'a, ba737) holds are from 
9:00pm to 1:00pm and from 2:00pm to 4:00pm on 1/10/95. Let us call p m xli and p m xl2 these 
maximal periods. (|29|) requires an et where the predicate holds (i.e. a subperiod of p m xii or 
Pmxn) to be a subperiod of [tfi rs t,st) (the It). That is, there must be some past time where 
J.Adams was inspecting BA737. (|31~1), on the other hand, would, with the database values 
assumed here, require et to be the period from 9:00pm to 4:00pm on 1/10/95, the inspection 
to reach its climax at the end of et, and et C [tfi rs t, st). That is, the entire inspection must 
be in the past. If the questions were submitted at 3:00pm on 1/10/95, (f29|) would be true, 
but (|3l]) would be false. This accounts for the fact that ( |30|) requires the inspection to have 
been completed, while (^|) does not. 

We note that our approach does not run into problems with the "imperfective paradox" 
((|3l|) implies (|29"|), but not the opposite; cf. the logic of [jl6|), and this is achieved without in- 
voking branching-time or possible worlds (cf. [P4l , {3f|), that are difficult to model in practical 
temporal database systems. 

3.6 Episode identifiers 

With culminating activity verbs, we often use an extra episode-identifying argument (the 
variable ep v in (|33|); ep v should also, for correctness, be present in (|29|) and (|3l|)). This 
is needed in sentences like (|32[), if J.Adams was involved in several different inspections of 
BA737. (We treat assertions like ( j32"l) as yes/no questions; variables can be thought of, 
informally, as existentially quantified.) 

(32) J.Adams inspected BA737 on 1/10/95. 

(33) At[l/10/95,Past[e v , Culm[inspecting(ep v , ja,ba737)}]] 



11 



Let us assume, for example, that apart from the inspection of 1/10/95, J.Adams was also in- 
volved in another inspection of BA737, that started at 11:00pm on 1/1/95, and stopped 
(without ever reaching its completion) at 4:00am on 2/1/95. Let us call p m xi3 

this pe- 
riod. We use the episode-identifying argument to distinguish between the two inspections: 
inspecting (epl, ja, ba737) is true for et C p m xii or et C p mx i2, while inspecting(ep2, ja, 6a737) 
is true for et C p mx i 3 . 

The preferred reading of (|32| ) is that an entire inspection (from start to completion) of 
BA737 by J.Adams occurred within 1/10/95. Without the episode-identifying argument, this 
reading is not captured correctly by (|33|): (|33| ) would require et to extend from the beginning 
of Pmxiz ( on 1/1/95) to the end of p m xi2 (on 1/10/95), and et to be a subperiod of 1/10/95. 
Since this is impossible, the answer would (incorrectly) be false. With ep v , in contrast, ( j33|) 
requires et to extend from the beginning of the earliest maximal period that corresponds to a 
particular value of ep v (epl or ep2) to the end of the latest maximal period that corresponds 
to the same ep v value, et must also be a subperiod of 1/10/95, and the inspection that 
corresponds to ep v, s value must reach its climax at the end of et. For ep v = ep2, all these 
conditions are satisfied, and hence the answer would (correctly) be affirmative. 



3.7 The For operator 

Duration adverbials are represented using the For operator. For example, (|34l) is mapped 
to (j35"l). day c and minute c represent the sets of all day- and minute-periods respectively. 
(The " c " suffix stands for "complete partitioning": the sets of all day- and minute-periods 
are complete, in the sense that the union of each set's elements is the entire time-axis.) The 



For operator of (35) requires et (the time of the circling) to cover 40 consecutive minutes. 

(34) On which day was BA737 circling for forty minutes? 

(35) ?<f Part[day c , d v \ A At[d v , For[min c , 40, Past[e v , cir cling (6o737)]] 



3.8 The Begin and End operators 

Verbs like "to start", "to stop", and "to finish" are expressed using the Begin and End 



operators. (|3q) and (|38|) are mapped to (|37|) and (|39|) respectively. (|37|) requires the end- 
point of a maximal period where the inspection is ongoing to be located at some past 5:00pm- 
minute. (39) requires the completion of an inspection to fall within a past 5:00pm-minute. 

(36) J.Adams stopped inspecting BA737 at 5:00pm. 

(37) Part[5:00pm9 , f v ) A At[f v , Past[e v , End[inspecting{ep v , ja, 6a737)]]] 

(38) J.Adams finished inspecting BA737 at 5:00pm. 

(39) Part[5:00pm9 , f v ] A At[f v , Past[e v , End[Culm[inspecting(ep v , ja, 6a737)]]]] 



3.9 The Pres and Perf operators 

The past perfect is expressed using a Past followed by a Perf operator. For example, (|40l) is 
mapped to pl|). The Perf operator introduces a new event time (denoted by e2 v in (|4l|)), 
that has to precede the original one (el"). (41) requires the time of the departure (e2 v ) to 
precede another past time (el"). The latter corresponds to Reichenbach's reference time IpTf , 
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a time from which the departure is viewed. With culminating activity verbs, a Culm operator 
is also inserted (e.g. (fl|) receives (fl3"l)), 

(40) BA737 had departed. 

(41) Past[el v , Perf[e2 v ,depart(ba737)]] 

(42) Had J.Adams inspected BA737? 

(43) Past[el v , Perf[e2 v , Culm[inspecting(ep v , ja,ba737)]]] 

A sentence like (p4|) has two readings: "on 1/1/95" may specify the reference time, or it may 



refer directly to the inspection. (In the latter case, (44) is similar to "Did J.Adams inspect 



BA737 on 1/1/95?", except that it creates an impression of a longer distance between the 



inspection and the present.) We represent the two readings as (45) and ([!(]) respectively. 

(44) Had J.Adams inspected BA737 on 1/1/95? 

(45) At[l/l/95, Past [el" , Perf [e2 v , Culm [inspecting (ep v , ja, 6a737)]]] 

(46) Past [el v , Perf[e2 v , At[l/l/95 , Culm[inspecting(ep v ,ja, 6a737)]]] 



requires the reference time (el") to be a subperiod of 1/1/95, and to follow an entire 
inspection of BA737 by J.Adams. Apart from introducing a new et, the Perf operator also 
resets It to the entire time-axis. Hence, in (|45|), e2 v does not need to be a subperiod of 1/1/95. 
In (^), the At operator narrows It after the Perf operator has reset it. This requires the 
inspection to be located within 1/1/95. 

It has often been claimed (e.g. M, p2|, pQ]) that the English present perfect asserts that 



some consequence of a past situation holds at the present. For example, (47) seems to imply 
that the engine is still on fire, or that it was damaged by the fire and has not been repaired. (A 
similar connection may exist between the past situation and the reference time in some uses 
of the past perfect.) This implication does not seem to be present (at least not as strongly) 
in®. 

(47) Engine 5 has caught fire. 

(48) Engine 5 caught fire. 

Although these claims are intuitively appealing, it is difficult to see how they could be used in 
a nltdb. Perhaps in (|49|) the nltdb should check not only that the landing was completed, 
but also that some consequence of the landing still holds. But what should this consequence 
be? Should the nltdb check that the plane is still at the airport, and should the answer be 
negative if the plane has departed again? Should the nldb check that the passengers are still 
at the airport? Given this uncertainty, we do not require the past situation to have present 
consequences. 

(49) Has BA737 landed? 

The present perfect could be expressed using a Pres and a Perf operator. (The Pres 
operator simply requires st to be located within et.) For example, ( |49| ) would be represented as 



(|50j). Given, however, that we do not require the past situation to have present consequences, 
there are very few differences between the present perfect and the simple past for our purposes. 
For simplicity, we represent the present perfect in the same way as the simple past, i.e. both 
© and flU) receive fl5|). 
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(50) Pres[Perf[e v , Culm[landing{ep v ,6a737)]]] 



(51) Did BA737 land? 

(52) Past[e v , Culm[landing(ep v , 6a737)]]] 

3.10 The Ntense operator 

We end the presentation of top with the Ntense operator (borrowed from |l6|]). This is useful 
in sentences like (|5^), where "president" may refer either to the present (the current president 
visited Rome, perhaps before becoming president), or to the time of the visit (the visitor was 
the president of that time). In J54|), the Ntense operator requires p v to be the president at 
st (indicated by now*). In contrast, in (|55|) the Ntense requires p" to be the president at the 
time of the visit (e v ). 

(53) The president visited Rome in 1991. 

(54) Ntense[now* , president (p v )] A At[yl991, Past [e v , visiting(p v , rome)]] 

(55) Ntense[e v , president (p v )] A At[yl991, Past[e v ,visiting(p v ,rome)]] 
We discuss (^) further in section |4.6| . 



4 From English to TOP 

The English requests are mapped to top formulae using an HPSG-based grammar |4J|. An 
hpsg grammar consists of schemata, principles, lexical rules, lexicon entries, and a sort hier- 
archy. Our grammar is very close to that of Pollard and Sag, with the main difference being 
that we use TOP expressions instead of situation-theoretic constructs. We have also added an 
aspect feature and principle to accommodate our aspectual taxonomy, and lexical signs and 
rules for verb forms, temporal adverbials, and other temporal mechanisms. 

This section attempts to offer a taste of how our HPSG-based mapping from English to 
TOP works (the full details can be found in 0]). Readers with previous exposure to modern 
unification-based grammars should be able to follow most of the discussion. Some of the 
details, however, may be unclear to readers not familiar with hpsg. 



4.1 Verb forms 



In HPSG, each word and syntactic constituent receives a sign, a feature-structure of a particular 
form, that provides information about the word or syntactic constituent. (|56|) shows the 
(lexical) sign of the base form of "to inspect" in the airport domain. (To save space, we 
ignore some uninteresting features, and we write "ss" instead of "synsem".) 



(56) 



PHON (inspect) 



SS LOC CAT 



VFORM 

HEAD 

AUX 

verb 

ASPECT culmact 
SUBJ 
COMPS 



( NP [Tj ) 
{ NP [2] ) 

ss I cont inspecting (ep v ,CE[,E) 



bse 
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According to (pq), "inspect" is the base form of a non-auxiliary verb, it is a culminating 
activity, and it requires a noun-phrase as its subject, and one as its object. The base form 
introduces a TOP expression of the form inspecting (ep v H]), where [T] and [2] correspond to 
the entities denoted by the subject and object respectively. (More precisely, Q] and [2] stand for 
hpsg indices. In our HPSG version, indices represent TOP constants and variables. Our indices 
have a special feature, not shown here, that distinguishes constant- from variable-representing 
indices.) The CONT values are actually feature-structures that represent TOP expressions. For 
simplicity, however, here we show directly the corresponding TOP expressions. 

The signs for non-base forms of non-auxiliary verbs are generated automatically from the 
corresponding base form signs by lexical rules. For example, the simple past ( |57| ) is generated 
from ( |56|) by a lexical rule that inserts a Past and a Culm in the CONT of the base sign. 
The lexical rule also modifies phon for simple past morphology, and changes vform to fin 
(finite form, a form that requires no auxiliary verb). The simple past signs of state, point, 
and activity verbs are generated by a similar rule, which does not insert, however, a Culm 
operator. 



(57) 



phon (inspected) 



VFORM fin 
AUX - 



HEAD 

verb 

^ Loci cat aspect culmact 
SUBJ { NPrjJ } 
COMPS ( NPnjl ) 

ss I cont Past[e v , Culm[inspecting(ep v ,\H,\S)]] 



When "inspected" combines with its subject and object in (|5§|), the CONT value of (| 



becomes (59), a value which is inherited by the sign of the overall sentence. Roughly speaking, 
the top representation of a sentence is the CONT value of the sentence's sign. (We will refine 
this statement in the following paragraphs.) 

(58) J.Adams inspected BA737. 

(59) Past[e v , Culm[inspecting(ep v , ja,ba737)]] 

Present participle signs are the same as the corresponding base ones, except that their 
VFORM is prp. In (|60|), for example, apart from its vform, the sign for "inspecting" is the 
same as (|56| ) (i.e. "inspecting" is also a culminating activity form). ( |6ll) shows the sign of 
"was" that is used in past continuous forms. 

(60) J.Adams was inspecting BA737. 



(61) 



PHON (was) 



SS LOC CAT 



VFORM fin 
AUX + 



HEAD 

verb 
ASPECT state 

SUBJ (H) 

SUBJ 



COMPS (VP 



VFORM 



prp 



m ) 



SS I CONT Past[e v ,\2§ 
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According to (|6l|), "was" requires a present participle verb phrase complement (e.g. "inspect- 
ing BA737"). The subject of "was" (CD) is the same as the subject of the complement. The 
CONT of "was" is also the same as the CONT of the verb phrase complement ([U), but with an 
additional Past operator. In (f30|), when "was" combines with its complement ("inspecting 
BA737") and its subject ( "J.Adams"), the CONT of @ b ecomes ( |62] ) . The sign of the overall 



(|g) inherits the CONT of (gg) (i.e. (gg) is mapped to (§2|)). 
(62) Past[e v ,inspecting(ep v , ja, 6a737)] 

The propagation of the aspect feature is controlled by our aspect principle: syntactic 
constituents inherit the aspect of their head-daughter (the "was" in "was inspecting"), except 
if the head-daughter combines with an adjunct-daughter (a modifier), in which case the mother 
syntactic constituent inherits the aspect of the adjunct-daughter (an example of the latter 
case will be given in section |4.3j ). In (p0|), the "was inspecting" (and then the overall sentence) 
inherits the state aspect of "was". In effect, the progressive has caused an aspectual shift 
to state ("inspecting" was a culminating activity). Along with our treatment of "at ... " 
temporal adverbials (discussed in section [4.3| below), this shift accounts for the fact that in 
"J.Adams was inspecting BA737 at 5:00pm." the inspection was probably simply ongoing at 



5:00pm (cf. "J.Adams inspected BA737 at 5:00pm"). A similar shift is assumed in [42], 
30|, and elsewhere. 



4.2 Habituals and non-habituals 



In the case of verbs with multiple homonyms (section ^), the lexicon provides multiple signs 



for the base forms. ( p3[ ) and (64), for example, correspond to the habitual and non-habitual 
homonyms respectively of "to service". (In our airport domain, each flight is habitually 
(normally) serviced by particular servicing companies, though sometimes a flight may be 
serviced by a company other than its normal one.) 



(63) 



(64) 



phon (service) 



VFORM 
AUX 



bse 



HEAD 

verb 

ss| loc cat aspect state 

SUBJ ( NP [Tj) 
COMPS (NPra) 

ss I cont hab-serv(\M, 0) 



phon (service) 



VFORM 
AUX 



bse 



HEAD 

verb 

SS | LOC | CAT ASPECT culmact 
SUBJ ( NP [Tj) 
COMPS (nP[|]) 
SS | CONT servicing (ep v ,1X1, HI) 

(|63| ) classifies the habitual homonym as a state verb, and introduces a predicate of the form 
hab_serv(\i\, \M), which is intended to hold at times where [D (the servicer) has the habit of 
servicing \2\ (the flight). (|64|) classifies the non-habitual homonym as a culminating activity 
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verb, and introduces a predicate of the form servicing (ep v [U), which is intended to hold 
at times where H is actually servicing \2\. 

Simple present signs are generated by a lexical rule that requires the base form to be a 
state. Thus, no simple present signs are generated for activity, culminating activity, or point 
verbs. This reflects the fact that (ignoring futurate meanings, which we do not address) these 
verbs are unlikely to be used in the simple present in a nldb context. For example, it is 



unlikely that (65) would refer to a currently ongoing service (i.e. with a reading that involves 
the non-habitual culminating activity verb); the present continuous would have been used 
instead. (|65| ) typically has a habitual meaning (that involves the habitual state verb). 

(65) Which company services BA737? 

In (|65|), this arrangement assigns only a habitual sign to "services" (the habitual sign is similar 
to (B3), with an additional Pres operator). This causes ( |65[ ) to receive only (p6[), which asks 
for the current habitual servicer of BA737. 

(66) ?c v Ntense[now*, company (c v )] A Pres{habserv(c v , 6a737)] 

In contrast, the simple past lexical rules generate both habitual and non-habitual simple past 
signs (derived from ( |63| ) and (|64[)). Hence, ( |67| ) receives both (|6S|) and (|69|). These correspond 
to two possible readings of (|67l): (|68l ) asks for the 1990 habitual servicers of BA737, while 
( p9| ) asks for companies that actually serviced (possible just once) BA737 in 1990. (The role 
of the Ntense operators will be discussed in section |4.6| .) 

(67) Which companies serviced BA737 in 1990? 

(68) lc v Ntense[e v , company (c v )] A At[yl990, Past[e v , hab_serv(c v ,6a737)]] 

(69) ?c v Ntense[e v , company (c v )] A At[y 1990, Past [e v , Culm[servicing(ep v , c v , 6a737)]]] 



4.3 Temporal adverbials 

Let us now consider the adverbial of ( [TOD. The lexicon provides multiple signs for prepositions 
that introduce temporal adverbials. ( |7l|) shows the "at" sign that is used when an "at . . . 
" adverbial modifies a state or point expression. (The head of ( f71~] ) is shown separately in 

©0 

(70) J.Adams was inspecting BA737 at 5:00pm. 



(71) 



PHON { at) 

""head 

SUBJ 
COMPS 
ASPECT 

ss | cont Aipim 



SS I LOC I CAT 





(np 
point 



(72) 



prep 



MOD | LOC | CAT | ASPECT state V point 

s|VfORM /traj:[I] V VP^VFORM pspj:[H 



MOD 



According to ([71]), "at" requires as its complement a noun phrase that denotes a minute- 
periodj^ Requiring the complement of (fn|) to denote a minute-period ensures that © will 

4 For readers familiar with HPSG, minjper is a subsort of index. In our HPSG version, indices are divided into 
subsorts that correspond to types of world entities. These subsorts are used to enforce selectional restrictions. 
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not be used with a noun phrase complement like "Monday" or "gate 2". (There are other "at" 
signs for uses of "at" in sentences like "BA737 arrived at gate 2".) The SS|loc|cat| aspect of 
( |7l| ) is the aspectual class of the modified expression after attaching the "at ... " adverbial. 
This will be discussed further below. 

(72) (the head of J7l|)) shows that ( |7l"| ) can be used when "at" introduces a modifier of a 
state or point expression. It also requires the expression being modified to be a finite sentence 
(a finite verb form that has combined with its complements and subject) or a past-participle 
(psp) verb phrase. We generally require temporal adverbials (e.g. "at 5:00pm", "on Monday") 
to modify whole finite sentences (e.g. "J.Adams inspected BA737") rather than finite verb 
phrases (e.g. "inspected BA737"), because in HPSG the latter approach would lead to problems 
with questions like (73): in (|73|), "was" combines in one step with both its subject "J.Adams" 
and its complement "BA737", following the head-subject-complement schema of hence, 
there would be no verb phrase constituent (verb that has combined with its complements but 
not its subject) to be modified by "at 5:00pm". 

(73) Was J.Adams inspecting BA737 at 5:00pm? 

Temporal adverbials are also allowed to modify past-participle verb phrases (see (|72|)). This 
is needed to be able to generate both readings of past perfect sentences like (f44|). We discuss 
this in following paragraphs. 

When "at" combines with "5:00pm" in @, "at 5:00pm" receives <M). 



(74) 



phon (at, 5:00pm) 

head ^ ' 

SUBJ {) 

SS | LOC | CAT 

COMPS {) 

ASPECT point 

ss|cont AfplCD] 
QSTORE | Part [5:00pm 9 , [TJ]j 

The QSTORE (quantifier storage) of ( |74"|) shows that the top variable represented by [D is 
existentially quantified over all 5:00pm-periods. (qstore is intuitively a set containing quan- 
tifiers along with their restrictions. Currently, all top variables are implicitly existentially 
quantified. This is why no explicit existential quantifier is shown in (|74j).) 

The "at 5:00pm" then attaches to "J.Adams was inspecting BA737", and ( |70|) receives 
(|75|). The principles of HPSG cause |7|) to inherit the head of (|l[), and the CONT and QSTORE 
of (|74|) . Our aspect principle (section [D]) also causes (|75|) to inherit the point aspect of 
(|74|). In effect, the point-specifying adverbial has caused the originally state sentence to 
become point. (This aspectual shift is needed to block some impossible readings in sentences 
with multiple temporal adverbials; see Q.) 
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(75) 



phon ( J.Adams. , was, inspecting, BA 737, at, 5:00pm) 

fin 



SS LOC CAT 



VFORM 
AUX 



+ 



HEAD 

verb 

ASPECT point 
SUBJ {} 
COMPS {} 

ss I CONT At^H, 

Past[e v , inspecting (ep v ,ja, ba 737)] 

QSTORE I Part [5: 00pm 3 , CD] j 



To obtain the TOP translation of (|70|), the expressions in the QSTORE of (|7g) have to be 
added in front of the CONT formula. (The variable-representing CD also has to be replaced 
by an actual top variable.) This leads to (|76|), which requires the inspection to have been 
ongoing at some 5:00pm-minute. (The reader is reminded that assertions are treated as yes/no 
questions. (|73|) receives the same formula.) 



(76) Part[5:00pm9 J v ] A At[f v , Past[e v ,inspecting(ep v , ja,ba737)}} 

These examples also demonstrate an important area in which our framework needs to 
be extended. In @ (or (^)) the user would typically have a particular 5:00pm-minute 
in mind (e.g. the 5:00pm-minute of the current or a previously mentioned day; see section 



5.5.1 of 34]). A temporal anaphora resolution mechanism would be needed to determine that 
particular minute from the context, but our framework currently provides no such mechanism. 
The answer to ( |73|) would therefore be affirmative if J.Adams was inspecting BA737 at any 
past 5:00pm-minute. A similar mechanism is also needed to cope with the anaphoric nature 
of tenses [61]. (|77|), for example, is typically a request to report flights that were circling at a 
particular contextually salient past time. In contrast, our framework currently generates an 
answer that reports all the flights that were circling at any past time. 

(77) Which flights were circling? 

Let us now consider the case where an "at ... " adverbial attaches to a culminating 
activity. The intended meaning is usually that the situation either starts or reaches its 



completion at the time of the adverbial. The first reading is the preferred one in (78). The 
second reading is the preferred one in (|7£ 



(78) J.Adams inspected BA737 at 5:00pm. 

(79) BA737 landed at 5:00pm. 

There is an "at" sign for each reading. (There are also additional signs for cases where "at . . . 
" adverbials modify activities. Cases where "at ... " adverbials modify states or points are 
covered by @.) @ is the sign for the reading where the situation of the culminating 
activity starts at the adverbial's time. The head of ( |80| ) is shown in (|8l|). 

(80) 



at) 

r HEAD 

SUBJ 
COMPS 
ASPECT 

ss | cont At JS, Begin p 



PHON 



SS LOC CAT 



& 



(np 
point 



■_perP~l^ 
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(81) 



prep 



MOD | LOC | CAT | ASPECT culmact 

MOD S^VFORM fin^M V Vp|vFORM pspj:[U 



Along with fl57|), ( p0|) causes (|78j) to be mapped to (|82|). (|82|) requires the period that covers 
the entire inspection (from start to completion) to be located in the past, and its start-point 
to fall within a 5:00pm-minute. 

(82) Part[5:00pm9, f v ] A At[f v , Begin[ Past[e v , Culm[inspecting(ep v , ja, 6o737)]]]] 

The sign for the reading where the situation reaches its completion at the adverbial's time is 
the same as (|80|), except that it introduces an End rather than a Begin operator. That sign 
leads to a second formula for fl7q), that contains an End instead of a Begin. That formula 



corresponds to an unlikely reading in the case of (781), but it would capture the preferred 



reading in (ua). A complete NLDB would typically paraphrase both formulae, and ask the 



user to select the intended one (see [22] for related discussion). Our prototype nltdb simply 
attempts to generate a separate answer for each formula. 

We now explain briefly the reason for allowing temporal adverbials to modify not only 



finite sentences, but also past-participle verb phrases. In (44) (see section above), this 
allows "on 1/1/95" to attach either to "had J.Adams inspected BA737" or to "inspected 
BA737". That is, (f44|) is taken to have two possible parses, sketched in (|83|) and (p4). These 



give rise to ( J4"5| ) and ( |46| ) respectively, that express the possible readings of (44). 

(83) [Had J.A. [inspected BA737]] on 1/1/95? 

(84) Had J.A. [[inspected BA737] on 1/1/95]? 

Like "at ... " adverbials, "on ... " adverbials introduce At operators. The Past and Perf 
operators of ( jig ) and ( f46| ) are introduced by the "had" auxiliary, while the Culm operator 
is introduced by the lexical entry of the past participle "inspected" . In (|84|), the adverbial 
attaches to "inspected BA737" before "had". Hence, in ( f46| ) the At ends up within the Past 
and Perf . In (]83|), in contrast, the "had" attaches to the verb phrase first. This causes the 



At in (45) to have wider scope over the Past and Perf. 
4.4 Fronted temporal modifiers 

When temporal modifiers attach to sentences, we allow them to either follow or precede 



the sentences they modify. This admits both ( p5|) and (86). Both sentences receive (p7|). 
(An alternative approach would be to allow temporal modifiers to participate in unbounded 
dependencies; see pp. 176 - 181 of (491.) 



(85) Tank 2 was empty on Monday. 

(86) On Monday tank 2 was empty. 

(87) Part[monday 9 ,m v ] A At[m v , Past[e v , empty (tk2)}} 

When temporal modifiers attach to past-participle verb phrases, we require the modifiers to 
follow the verb phrases, as in (|88[). This rules out unacceptable sentences like (|S9"|), 

BA737 had [[entered sector 2] at 5:00pm]. 
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(89) *BA737 had at 5:00pm entered sector 2. 

Our approach causes (|90| ) to receive only (^) (where the adverbial specifies the reference 
time), because in ([X]) "at 5:00pm" can modify only "BA737 had entered sector 2"; it cannot 
modify just "entered sector 2" because of the intervening "BA737 had". 

(90) At 5:00pm BA737 had entered sector 2. 

(91) Part[5:00pm9, f v ] A At[f v , Past[el v , Perf[e2 v , enter(ba737, s2)]}] 

In contrast, (p^) receives both ( |9l"| ) and (p3|), because in that case the adverbial can modify 
either "BA737 had entered sector 2" or "entered sector 2". In ( |93"|) the adverbial specifies 
directly the time of the entrance. 

(92) BA737 had entered sector 2 at 5:00pm. 

(93) Part[5:00pm9 , f v ] A Past[el v , Perf[e2 v , At[f v , enter(ba737, s2)]}] 

The fact that (^) does not receive ( p3| ) does not seem to be a disadvantage, because in ( p0[ ) 
the reading where the adverbial specifies directly the time of the entrance seems unlikely. 

There is, however, a complication with our approach: when a sentence is modified by 
both a preceding and a trailing temporal modifier (as in (p4|)), two parses are generated: 
one where the trailing modifier attaches first to the sentence, and one where the preceding 
modifier attaches first. In (|94|), this generates two semantically equivalent formulae, ( |95| ) and 
(ptt). Our framework currently provides no mechanism to eliminate one of the two. 

(94) On 1/7/95 BA737 was at gate 2 for two hours. 

(95) At[l/7/95,For[hour c ,2, Past[e v , location jo f(ba737, g2)]]] 

(96) For[hour c ,2, At[l/7/95, Past[e v , location jo f(ba737, g2)}}} 



4.5 Temporal subordinate clauses 

Like temporal adverbials, temporal subordinate clauses are treated as modifiers of finite sen- 
tences or past-participle verb phrases. In (|97|), for example, "while runway 2 was open" is 
taken to modify "did any Eight circle". As with temporal adverbials, when the modifier at- 
taches to a sentence, it is allowed to either precede or follow the sentence. This admits both 



(gg) and fl9|). 

(97) Did any flight circle while runway 2 was open? 

(98) While runway 2 was open, did any flight circle? 

Space does not permit a detailed presentation of our treatment of temporal subordinate 
clauses. We note only that the words that introduce the subordinate clauses (e.g. "while", 
"before") receive several signs in the lexicon, that are sensitive to the aspectual classes of both 
the main and the subordinate clause. There are, for example, two "after" signs for the case 
where the main clause is a point (e.g. "which Eights departed" in the airport domain) and 
the "after ..." clause is a state (e.g. "system 32 was in operation"). The first sign introduces 
an At operator, whose first argument is the TOP representation of the subordinate clause, 
and the second argument is the top representation of the main clause. The second sign is 
similar, but it also introduces a Begin operator in front of the formula of the subordinate 
clause. These two signs cause (|9~9"|) to receive (|100|) and ( |10lD . 
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(99) Which flights departed after system 32 was in operation? 

(100) If flight(f v ) A After[Past [el v , in_oper(sys32)], Past [e2 v , depart(f v )]] 

(101) lf v flight(f v ) A After[Begin[Past[er,in_oper(sys32)]],Past[e2 v ,depart(f v )]] 



(|99j) is ambiguous between a reading where the departures must have happened after the 
beginning of a maximal period where system 32 was in operation, and a reading where the 
departures must have happened after the end of such a maximal period. The two readings 



are captured by (|100| ) and (101) respectively. 



4.6 Nominals and temporal reference 

Following a distinction introduced in [48], we use different signs for predicative and non- 



predicative uses of nouns (e.g. "president" is predicative in ( |102[) , but non-predicative in 
(104)). Predicative noun signs contribute predicates (president(w v ) in (|103j )) that always end 



up within the operators of the verb's tense (the Past in ( |l03j )). (There is a special sign for 



the non-auxiliary "was" in (102). This introduces a Past operator, and inserts the predicate 



of "president" into the second argument-slot of the Past.) This arrangement captures the 



fact that predicative nouns always refer to the time of the verb tense {e v in ( 103 ), the event 
time that has to fall within 1991). 

(102) Who was the president in 1991? 

(103) ?w v At[yl991, Past[e v ,president(w v )}] 

(104) The president visited Rome in 1991. 

In contrast, non-predicative nouns can generally refer either to the time of the verb tense, 
or to the speech time. As already discussed in section 3.10| , for example, the "president" 



of ( |104D may refer to the person who was the president at the time of the visit, or to the 
present president. Non-predicative nouns can actually refer to any contextually salient time 
(see [jl7| , psj ] , and |3^] ) , but limiting these salient times to st and the time of the verb tense 
is a reasonable simplification at this stage of development of nltdbs, since a more thorough 
treatment would require a complete system of temporal anaphora. 

To account for the possible meanings of non-predicative nouns, non-predicative noun signs 
place the nouns' predicates within Ntense operators (e.g. Ntense[t v , president (p v )]). These 
predicates (along with the Ntenses) are inserted into the quantifier storage, and at the end 
of the parsing they are added in front of the rest of the formula, outside the operators of the 



verb tenses. This causes (104) to receive (105). 



(105) Ntense[t v , president (p v )] A At[yl991, Past[e v ,visiting(p v ,rome)]] 

In the formulae that are generated at the end of the parsing, the Ntenses require the nouns' 



predicates to hold at unspecified times (t v in ( |105|) ). During a subsequent post-processing 



phase (to be discussed in section 4^8), each Ntense generally leads to multiple formulae ( (|54] 



and (|55|) in the case of ( |105|) ), where the first argument of the Ntense has been replaced 
by now* or a variable that occurs as a first argument of a Past or Perf operator. These 
formulae capture the readings where the noun refers to the speech time or the time of the 
verb tense respectively. (In past perfect sentences, the "time of the verb tense" can be either 
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the reference time or the time of the past situation.) In present sentences, like (106) that is 



initially mapped to (|107| ), this arrangement correctly generates only one formula, where the 
first argument of the Ntense is now* (current president; in this case there is no Past or Perf 
operator) . 

(106) The president is visiting Rome. 

(107) Ntense[t v , president (p v )] A Pres[visiting(p v ,rome)] 

This strategy has the disadvantage that it leads to a multiplicity of formulae in past sentences 
with several non-predicative nouns. Our prototype NLTDB uses a rather ad hoc mechanism 
to reduce drastically the number of generated formulae, that is based on the observation that 
with many nouns, satisfactory answers can be obtained by assuming that they refer always 
to st, or always to the time of the verb tense. We do not discuss this mechanism here (see 
§)• 

4.7 Interrogatives 

The interrogatives "who" and "what" are treated syntactically as noun phrases. For example, 
(|108| ) receives the same syntactic analysis as (|11C| ). (We ignore punctuation.) The only 
difference from ordinary noun phrases is that "who" and "what" introduce interrogative 



quantifiers. (|108f ) and ( |110|) receive ( |109|) and ( |111[) respectively. 

(108) Who was inspecting BA737? 

(109) ?w" Past[inspecting(ep v ,w v , 6a737)] 

(110) J.Adams was inspecting BA737. 

(111) Past[inspecting{ep v , ja, 6a737)] 

Similarly, the interrogative "which" is treated in the same way as other noun-phrase deter- 
miners (e.g. "a" in ( |114| )), except that it introduces an interrogative quantifier. (|ll2j ) and 
(|114|) receive ([11 3D and (|115|) respectively. 



(112) Which tank is empty? 

(113) Ik" Ntense[now*,tank{k v )} A Pres[empty(k v )] 

(114) A tank is empty. 

(115) Ntense[now* ,tank(k v )] A Pres[empty(k v )] 

Our HPSG version also supports questions with multiple interrogatives, like ( |116| ) which is 
mapped to (|117| ), ( |116|) receives the same syntactic analysis as ( |110| ). 



(116) Who was inspecting what. 

(117) ?wl v ?w2 v Past[e v , inspecting (ep v ,vuV ,^2")] 

Questions like (118), where the interrogative refers to the object of the verb, are treated using 



hpsg's unbounded-dependencies mechanisms (see |2| and [^] for details). 
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(118) Which flight did J.Adams inspect? 



Finally, the "when" of ( |119| ) is treated syntactically as a temporal-adverbial modifier of 
finite sentences. That is, ( |119| ) receives the same syntactic analysis as (120). 



(119) When was tank 2 empty? 

(120) On 1/7/95 was tank 2 empty? 

The "when" of (|119|) introduces a ? mx ; quantifier (see section 3.2). In the formula that is 



generated at the end of the parsing, the variable of this quantifier does not occur elsewhere in 
the formula (e.g. (119) initially receives (121)). During the post-processing phase (discussed 



in section below), this variable is replaced by the first argument of a Past or Perf operator 
introduced by the main clause. In ( |121| ), this replaces f by e v , leading to a formula that asks 
for the past maximal intervals where tank 2 was empty. 

(121) l mxl t v Past[e v , empty (tk2)} 
4.8 Post-processing 

Before being translated into TSQL2, the top formulae that are generated by the parsing 
undergo an additional post-processing phase. This is a collection of minor transformations 
that cannot be carried out easily during the parsing. Two of these transformations have 



already been discussed in sections 4.6 and 4.7: replacing the variables of Ntense operators 



and ? mx ; quantifiers with variables that occur as arguments of Past or Perf operators. 

The third post-processing transformation is needed when "for ... " duration adverbials 
combine with verb forms that normally imply that a climax was reached. In these cases, 
the "for ... " adverbials (when felicitous) cancel the normal implication that the climax was 



reached. (122), for example, carries no implication that BA737 arrived at gate 2 (cf. (|124|)) 



The problem is that the parsing of ( |122| ) generates (123), which requires the taxiing to have 
been completed. 

(122) BA737 taxied to gate 2 for five minutes. 

(123) For[minute c , 5, Past [e v , Culm[taxiingjto{ep v , 6a737, <?2)]] 

(124) BA737 taxied to gate 2. 

Roughly speaking, to solve this problem during the post-processing any Culm operator that 
is within the scope of a For operator is removed. In (|123| ), this results in a formula that no 
longer requires the taxiing to have been completed. 

"In ... " duration adverbials also introduce For operators (e.g. parsing ( |125| ) generates 
(|123| )). For operators are actually marked with a flag that shows if the operator was intro- 
duced by a "for ... "or "in ... " adverbial, and the post-processing removes only Culm 
operators from within For operators introduced by "for ... " adverbials. This distinction is 
necessary because unlike "for ..." adverbials, "in ..." duration adverbials do not cancel the 
implication that the climax was reached ( (|123| ) is the correct representation of ( |125| )). 



(125) BA737 taxied to gate 2 in five minutes. 

Admittedly the post-processing compromises the compositionality of our English to top 
mapping. This, however, does not seem to have any practical disadvantages. 
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4.9 The linguistic coverage 

We conclude the discussion of our English to top mapping with a list of temporal linguistic 
phenomena that our framework can or cannot handle. There is a wealth of temporal mecha- 
nisms in English (and most natural languages), and it would have been impossible to consider 
all of them in the time that we had available. Instead, we targetted a carefully selected set of 
temporal mechanisms, which is both non-trivial and representative of many types of temporal 
phenomena. Of course, perfect grammatical/semantic descriptions are not achievable, even 
within non-computational projects, so we are well aware that our proposed analyses are no 
more than useful simplifications. Nevertheless, we believe that our framework can serve as a 
basis for further work that will extend its coverage of temporal phenomena (see also section 

Verb tenses: We support six tenses: simple present, simple past, present continuous, past 
continuous, present perfect (treated as simple past), and past perfect. We have not considered 
the past perfect continuous, future tenses, or futurate readings of present and past tenses (e.g. 



the futurate reading of the simple present - see section 2.2). We do not expect adding more 
tenses to be particularly difficult. 

Temporal verbs: Among special temporal verbs, we support "to start", "to begin", "to 
stop", and "to finish". We have not considered other special temporal verbs, like "to happen" 
or "to follow". 

Temporal nouns: Our framework supports nouns like "year", "month", "day", etc. (and 
proper names like "Monday", "January" , "1/5/92"). Nouns referring to more abstract tem- 
poral entities (e.g. "period", "event"), and nouns that introduce situations (e.g. "the con- 
struction of bridge 2") are not supported. 

Temporal adjectives: No temporal adjectives (e.g. "first", "earliest") are supported, with 
the exception of "current", that we use to restrict the times to which some nouns can refer 
(e.g. "current president"; see section |4.6| ). 

Temporal adverbials: Our framework supports temporal location adverbials introduced 
by "at", "on", "in", "before", and "after" (e.g. "at 5:00pm", "on Monday", "in January", 
"before 5:00pm", "after 1/1/95"), as well as "yesterday", "today", and duration adverbials 
introduced by "for" and "in" ("for two days", "in two hours"). Other adverbials that specify 
boundaries (e.g. "since 5:00pm" , "until 1/1/95") should be easy to add. Frequency adverbials 
(e.g. "twice", "daily") seem more difficult to support. 

Temporal clauses: Only temporal subordinate clauses introduced by "while", "before", 
and "after" are currently handled. Adding support for other boundary-specifying clauses 
(e.g. "since J.Adams became manager", "until tank 2 is empty") should not be difficult. 



"When ... " clauses are known to be the most difficult to support (see [^] and [53] for 
related discussion). 

Temporal anaphora: Among temporal anaphoric phenomena, only the anaphoric nature 
of nominals like "the president" is supported (section 0). As discussed in section |4.3| , 
our framework currently provides no mechanism to determine the particular times to which 
adverbials like "at 5:00pm" or verb tenses refer. 
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5 The TSQL2 language 



We now turn to TSQL2, the database language into which we translate the TOP formulae that 
the linguistic processing generates. TSQL2 [54] is a temporal extension of SQL [39|, the de 



facto standard database language. Traditional database languages such as SQL can and have 
been used to handle temporal information, leading some researchers to question the need for 



special temporal support in database languages (see [19] for related discussion). The lack of 
special temporal support in these languages, however, makes it difficult to manipulate time- 
dependent information. TSQL2, for example, adds to SQL a period datatype, and keywords to 
compute the intersection of two periods, to check if two periods overlap, etc. Without these, 
the rules that translate from top to database language (to be discussed in section |6|) would 
be much more difficult to formulate, and much harder to comprehend. 

We also note that until recently there was little consensus among temporal database 
researchers on how temporal support should be added to SQL (or other database languages), 
with every researcher essentially adopting his/her own temporal database language. Perhaps 
as a result of this, only a very few database management systems (dbmss) that support these 
languages have been implemented (mostly early prototypes; see Q). In contrast, TSQL2 was 
designed by a committee comprising most leading temporal database researchers, and hence it 
has a much better chance of being supported by (or at least influencing) forthcoming temporal 
database systems. As already mentioned in section [l|, a prototype database system (called 
timedb) that supports a version of TSQL2 has already appeared. 

Like SQL, TSQL2 is designed to be used with the relational database model, where infor- 
mation is stored in relations, intuitively tables consisting of rows (called tuples) and columns 
(attributes). As an example of a TSQL2 relation, tank-contents shows the contents of various 
tanks over time. 



tank -contents 


tank 


content 




tankl 


oil 


{[1/1/95, 31/5/95], 






[1/9/95, 1/5/96]} 


tankl 


empty 


{[1/6/95, 31/8/95]} 


tankl 


water 


{[1/1/95, 31/5/95], 






[1/8/95, 31/8/95]} 


tankl 


empty 


{[1/6/95, 31/7/95], 






[1/9/95, 1/5/96]} 


tank3 


foam 


{[1/1/95, 1/5/96]} 



tank-contents has two explicit attributes (tank and content), and an unnamed time-stamping 
attribute that shows when the information of each tuple was true (e.g. tankl contained oil from 
1/1/95 to 31/5/95, and from 1/9/95 to 1/5/96). We use a double line to separate the time- 
stamping attribute from the explicit ones, tank-contents is in a normalised form where there 
is no pair of tuples with the same explicit attribute values, and the time-stamps (the values 
of the time-stamping attribute) are sets of maximal periods. For simplicity, we assume in this 
example that time-stamps are specified at the granularity of days. In the airport domain, the 
time-stamps are actually specified at the granularity of minutes (e.g. the last time-stamp of 
tank_contents above would be something like {[8:32am 1/1/95, 4:16pm 1/5/96]}). 

Information can be retrieved from the database using TSQL2 SELECT statements (these 
are temporally enhanced forms of the corresponding SQL statements). For example, ( |126[ ) 
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generates the relation in (127), that shows the tanks that were empty at any time during July 
1995. 



(126) SELECT SNAPSHOT tl.tank 
FROM tank_contents AS tl 
WHERE tl. content = 'empty' 

AND VALID(tl) OVERLAPS PERIOD 

(127) 



[1/7/95 - 31/7/95] 



result of (126) 



tank 



tankl 
tankl 



The tl in the FROM clause of ( |126| ) is a correlation name. It can be thought of as a tuple- 
variable that ranges over the tuples of tankjcontents. The WHERE clause allows tl to range 
over only tuples whose content is empty, and whose time-stamp overlaps the period from 
1/7/95 to 31/7/95 (VALID(tl) refers to the time-stamp of the tl-tuple). Finally, the SELECT 
clause specifies that the result should be a snapshot relation (a relation with no time-stamping 
attribute), whose only attribute contains the tank values of the tl-tuples. 

As a further example, ( |128| ) generates ( |129| ), where each time-stamp is a single maximal 
period where the corresponding tank was empty. 

(128) SELECT tl.tank 
VALID VALID(tl) 

FROM tank_contents (PERIOD) AS tl 
WHERE tl. content = 'empty' 

(129) 



result of (12S) 


tank 




tankl 
tankl 
tankl 


[1/6/95, 31/8/95] 
[1/6/95, 31/7/95] 
[1/9/95, 1/5/96] 



The (PERIOD) in the FROM clause of ( |128| ) generates an intermediate relation that is the same 
as tank_contents, except that there is a separate tuple for each period in the time-stamps 
of tankjcontents (a tuple for tankl, oil, and [1/1/95, 31/5/95], a tuple for tankl, oil, and 
[1/9/95, 1/5/96], etc.). The tl of ( |128j ) ranges over tuples of the intermediate relation whose 
content is empty. The SELECT and VALID clauses specify that the resulting relation should 
have one explicit and one time-stamping attribute, and that the values of these attributes 
should be the tank values and time-stamps of the tl-tuples. 

To simplify the mapping from top to TSQL2 (to be discussed in section ^), and to bypass 
some obscure points in the definition of TSQL2, we had to introduce some modifications in 
TSQL2.0 These are relatively few and well-documented (in Q), to the extent that we are 
confident that they do not undermine the research value of our top to TSQL2 mapping. For 
example, we assume that attributes are ordered, and we refer to attributes by number rather 
than by name (e.g. we write (|130| ) instead of ( |128| )). 



'Our work also revealed a number of possible improvements to TSQL2. These were reported in 
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(130) SELECT tl.l 

VALID VALID(tl) 

FROM tank_contents (PERIOD) AS tl 
WHERE tl.2 = 'empty' 

The most significant of our modifications is the addition of a (SUBPERIOD) keyword. This is 



intended to be used with relations like (129), where each time-stamp is a single maximal pe- 
riod. When applied to Q129D , (SUBPERIOD) generates ( |132j ), that apart from the tuples of ( |129[ ) 
contains tuples for all the subperiods of the original time-stamps. The use of (SUBPERIOD) is 

.. (p 



illustrated in (131), where "... ( |130| ) ... " stands for the statement of ( |130| ) (the current 
versions of SQL and TSQL2 allow embedded SELECT statements in the FROM clause). (]131|) 
generates a relation that contains all the tuples of ( |132j ) whose time-stamp is a subperiod of 
[4/6/95, 27/8/95]. 

(131) SELECT t2.1 

VALID VALID (t2) 

) (SUBPERIOD) AS t2 
27/8/95] 



FROM ( . 
WHERE PERIOD 



(130) 



[4/6/95 



CONTAINS VALID (t2) 



(132) 



result of (130() 


tank 




tankl 
tankl 
tankl 
tankl 


[1/6/95, 31/8/95] 
[3/6/95, 30/8/95] 
[5/6/95, 26/8/95] 
[1/7/95, 1/7/95] 


tankl 
tankl 


[1/6/95, 31/7/95] 
[4/6/95, 19/7/95] 


tankl 
tankl 


[1/9/95, 1/5/96] 
[3/9/95, 1/2/96] 



It is important that an implementation of the (SUBPERIOD) construct does not involve explicitly 
computing and storing this entire relation, as this would involve vast amounts of space. The 
regular structure of the relation, and the very fact that it is computable from a formula, 
allows a database system to use a virtual relation, which behaves as if it contained all the 
many tuples of the entire relation, without the need for explicit representation of each one. 



6 From TOP to TSQL2 

We now discuss how top formulae are translated into TSQL2. Although a method to translate 
from a form of temporal logic to TSQL2 has already been proposed jjj, that method is not 
applicable to nltdbs, because it is not coupled to a systematic mapping from natural language 
to logic, and because the nature of the logic of 0] makes devising such a mapping very difficult 
(this is discussed further in Q). 

6.1 Primitive TOP expressions 

One central issue for any kind of nldb is that of portability. Customising an nldb or nltdb 
to handle a new or different domain is bound to involve some human effort in defining how 
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natural language expressions or lexical entries relate to symbolic structures in the database 
and/or expressions in the database query language. A well-designed system will limit the 
range of structures that have to be altered, and make it very clear what items must be edited 
and in what way. We have attempted to do this by specifying a well-defined set of functions 
(the h! functions of figure [[]) which form this particular interface. 

The person who configures the nltdb for a new database has to specify how primitive 
top expressions like constants and predicate functors relate to TSQL2 expressions. In the case 
of top constants, the configurer has to specify a function h' cons that maps each constant to 
a TSQL2 literal that represents the same entity in the world. For example, the top constant 
tk2 (used, for example, in ( |121| )) represents the same tank as the attribute value tank2 in 
tank -contents, and this attribute value is written in TSQL2 as 'tank2'; i.e. h' cons (tk2) = 
'tank2'. Similarly, we use the TOP constant yl991 (e.g. in ( |105| )) to refer to the time- 
period that is written in TSQL2 as PERIOD ' [1/1/91 - 31/12/91] '; i.e. h' cons (yl991) = PERIOD 
' [1/1/91 - 31/12/91] '. 

In the case of top predicate functors (e.g. empty in (|121| )), the configurer has to specify a 
function h' p ^ uns that maps each functor to a TSQL2 SELECT statement.^] That statement must 
generate a relation of the same form as ( |129j ) that shows for each combination of predicate 
arguments, the maximal periods where the situation of the predicate is true (i.e. h' p f uns (empty) 
would be the statement (|12g| ) that generates (|129|) ).n With predicate functors introduced by 
culminating activity verbs (e.g. inspecting in ( |117| )), the configurer specifies an additional 
SELECT statement. This generates a relation that shows for each combination of predicate 
arguments, whether or not the situation of the predicate has reached its climax. We do not 
discuss this here (see |2[). 

To some extent, the specification of h! functions can be supported by standard templates 
for the configurer to use, or by adopting rewrite rules that specify the values of the b! functions 
for whole classes of top expressions. In a particular domain, for example, it may be appro- 
priate to use a rewrite rule stating that top constants containing underscores are mapped 
to multi-word TSQL2 strings by replacing underscores with spaces and capitalising the first 
letter of each word in the TSQL2 string (this sets h' cons (united_kingdom) = 'United Kingdom'). 
Rewrite rules of this kind can be easily incorporated into our prototype nltdb, where the h! 
functions are implemented as Prolog rules. 

6.2 The translation rules 

The TOP formulae are translated into TSQL2 by a set of translation rules of the form: 

trans(4>, A) = E 

where (j) is the formula to be translated, A is a TSQL2 expression that intuitively represents 
what was the localisation time when (j) was encountered (this will become clearer below), 
and S is the TSQL2 translation of (j) (S is always a SELECT statement). There are: (i) basic 
(non-recursive) translation rules for the cases where eft is a atomic formula or a formula of the 

6 In Q h' p f uns has an additional argument that corresponds to the arity of the predicate. For simplicity, we 
ignore this argument here. 

7 This assumes that each predicate can be mapped directly to a relation computed from the contents of 
the database. There are cases (e.g. the "doctor-on-board" problem |Ho[) where this is not possible, and an 
intermediate reasoning layer is needed. See ^ for discussion on how this reasoning layer would fit into our 
framework. 
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form Culm[(j)'], and (ii) recursive translation rules for more complex formulae, that translate 
4> by calling other translation rules to translate subformulae of (p. 

For example, ignoring some details, the translation rule for the case where 4> is an atomic 
formula 7t(ti, . . . , r ra ) is the following: 

trans(n(ri, . . . ,r n ), A) = 
SELECT a.i, a.2, . . . , a.n 
VALID VALID (a) 

FROM (ti pfuns (ir)) (SUBPERIOD) AS a 
WHERE . . . 
AND ... 

AND . . . 

AND A CONTAINS VALID (a) 

Whenever the translation rule is used, a is a new correlation name (section ||). The "... " 
in the WHERE clause are all the strings in S\ U S2: 

S x = {"a.i = ti cons (n)" \i£ {1,2,3,..., n} 

and Ti is a top constant} 
S 2 = {"a.i = a. f I i,j G {1,2,3,... ,n}, 

Ti = Tj, and Ti,Tj are TOP variables} 

Informally, S2 contains restrictions that are needed when a variable appears at more than one 
predicate-argument positions. The S2 restrictions ensure that the attributes of the resulting 
relation that correspond to these argument positions have the same values in each tuple. 
Let us assume, for example, that we use the rule above to translate the formula empty(tk v ), 



and that A is set to the Ai of (133), where x is the TSQL2 name of the finest available granularity 
(the level of chronons in TSQL2 terminology). The TIMESTAMP 'now' - INTERVAL '1' x below 
refers to the time-point immediately before st. 

(133) Ai = INTERSECT ( 

PERIOD (TIMESTAMP 'beginning', 

TIMESTAMP 'forever) , 

PERIOD (TIMESTAMP 'beginning', 

TIMESTAMP 'now' - INTERVAL ' 1 ' x) ) 

Ai is the TSQL2 representation of [tfi rst , ti ast ] n [t first-, st), where tf irst and ti ast are the 
earliest and latest time-points respectively (like top, TSQL2 assumes that time is discrete 
and bounded). [tfi rs t, ti ast ] D [tfirst-, st) is, of course, equal to [t/j rs t, st). We use Ai here 
to make this example easier to integrate with later examples. The translation rule maps 
empty(tk v ) to ( |134|) , where h' p j, uns (empty) is (|128| ). Assuming that 1/5/96 is in the past, 
(|134| ) generates (|132| ), that shows for all the possible values of tk v , the past event-time periods 



where empty(tk v ) is true that fall within It (the period of Ai). (The WHERE clause of (134) has 
no effect in this case, because It = [tfi rs t, st), and all the time-stamps of the t2-tuples are 
subperiods of It.) 

(134) SELECT t2.1 

VALID VALID (t2) 

FROM (h' p j uns (empty)) (SUBPERIOD) AS t2 
WHERE Ai CONTAINS VALID (t2) 
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In this particular example, Si and S2 are empty. If the predicate contained constants as 
arguments (e.g. empty (tk2)), S± would introduce restrictions in the WHERE clause that would 
cause the resulting relation to contain information only about the world entities of those 
constants. In empty(tk2), for example, ( |134| ) would contain the additional restriction t2.1 = 
'tank2' in its WHERE clause, which would remove from the result tuples that do not refer to 
tank 2. 

Let us now consider the recursive translation rule for the case where (ft has the form 
Past\(3, (ft'] ((3 is a top variable, and (ft' a formula): 

trans(Past[(3, (ft'], A) = 

SELECT VALID (a) , a.l, a. 2, ... , a.n 

VALID VALID (a) 

FROM (trans((ft' , A')) AS a 

Every time the translation rule is used, a is a new correlation name, n is the number of 
explicit attributes of the relation generated by trans((p' , A'), and A' is the TSQL2 expression: 
INTERSECT (A, PERIOD (TIMESTAMP 'beginning' .TIMESTAMP 'now' - INTERVAL '1' x)) 

The translation rule for Past\(3, (ft'] narrows the period of A (intuitively It) to its intersection 
with [t fi rs t, st) (this is in accordance with the semantics of the Past operator; see section |3.ip . 

Let us assume, for example, that the rule above is used to translate Past[e v , empty (tk v )], 
and that A has been set to the A2 of Q135| ) (that specifies a period that covers the entire 
time-axis). Past[e v , empty (tk v )] would be mapped to (|136| ), where trans(empty(tk v ), Ai) is 

flU). 

(135) A 2 = PERIOD (TIMESTAMP 'beginning' , TIMESTAMP 'forever') 

(136) SELECT VALID(t3), t3 . 1 
VALID VALID (t3) 

FROM (trans{empty{tk v ),\ 1 )) AS t3 

The translation rules for formulae with no interrogative quantifiers (e.g. Past[(5,(ft']) generate 
SELECT statements that return relations containing one explicit attribute for each predicate- 
argument and each variable that occurs as argument of a temporal operator. (|136| ) returns 
(]137| ) , where the first and second explicit attributes correspond to e v and tk v respectively, 
and the time-stamps represent event-time periods where Past[e v , empty (tk v )] holds, for the 
various combinations of e v and tk v values. 





tankl 


[1/6/95, 31/8/95] 




tankl 


[3/6/95, 30/8/95] 




tankl 


[5/6/95, 26/8/95] 




tankl 


[1/7/95, 1/7/95] 




tankl 


[1/6/95, 31/7/95] 




tank2 


[4/6/95, 19/7/95] 




tankl 


[1/9/95, 1/5/96] 




tankl 


[3/9/95, 1/2/96] 



The SELECT clause of ( |136| ) specifies that (|137| ) must have two explicit attributes, and that 
in each tuple the first explicit attribute (the one for e v ) must have the same value as the 
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time-stamp. (To save space, we do not show the values of the first explicit attribute in (137).) 
This reflects the fact that in Past[e v , empty (tk v )] the semantics of Past requires the value of 
e v to be the event time of empty (tk v ) (section |3.2|). 



When translating from TOP to TSQL2, A is initially set to the A2 of ( |135| ). (This reflects the 
fact that It initially covers the whole time-axis - section |3.1| .) For example, the TSQL2 transla- 
tion of (138) ( "Was anything empty?" (at any past time)) is trans(Past[e v , empty(tk v )], A2), 



i.e. ( 136 ). In the case of yes/no questions, the answer is affirmative if the generated TSQL2 



code returns a relation with at least one tuple, and negative otherwise. In (138), the generated 



TSQL2 code (i.e. ( 136 )) returns (137), and hence the answer would be affirmative. 



(138) Past[e v ,empty(tk v )] 

As a final example of a translation rule, we show below the rule for the case where <f> 
has the form 1j5\ ?/?2 • • 4>' (Pi, Pi, ■ ■ ■ , Pk are TOP variables, and <p' a formula). This 
type of formula is intended to reflect the meaning of w/i-questions in English, with each ?/% 



corresponding to a wh-word ("what", "which", etc.; see sections 3.2 and |4.7| ). 

trans(Wi ?#a ■ • ■ Wk <f» ', A) = 

SELECT SNAPSHOT a.UJi , a.uj 2 , . ■ ■ , a.uj k 

FROM trans(4>' , A) AS a 

Whenever this rule is used, a is a new correlation name, ui, UJ2, ■ ■ ■ , cok are numbers referring 
to the explicit attributes of the relation returned by trans(4>' , A). The wi-th, W2-th, . . . , lo^- 
th explicit attributes of that relation must correspond to Pi, Pi, ■ ■ ■ , Pk respectively. (This 
correspondance is defined formally in [Q].) 

For example, the TSQL2 translation of <fi =?tk v Past[e v , empty(tk v )] is trans(4>, A2), which 



according to the translation rule above is ( |139| ). (The trans(Past[e v , empty(tk v )], A2) in ( 139 ) 
is 00 



(139) SELECT SNAPSHOT t4.2 

FROM (trans(Past[e v ,empty(tk v )},\ 2 )) AS t4 



( |139| ) generates a one-attribute snapshot relation that lists all the tanks whose names appear 
in the second explicit attribute of ( |137| ) (i.e. tanks that were empty in the past). 



6.3 Outline of the correctness proof 

We have proven formally (in Q) the correctness of the TOP to TSQL2 translation rules. The 
full details of this proof are highly formal and voluminous, and hence beyond the scope of 
this paper. Here, we only attempt to provide an outline of our approach. 

We assume that a top expression and a TSQL2 query both refer to some universe of world 
entities, relationships, etc. (much as in first-order logic), including temporal entities (such 
as periods). The denotation, in terms of this universe, of a top formula is provided by 
the definition of top's semantics (see Q). The denotation of a TSQL2 expression is given 
indirectly, in that we assume that there is an "evaluation" function which will map a TSQL2 
query to some database constructs (attribute values, relations, etc.), and the semantics of 
the database indicates how such constructs are related to the universe of world entities, 
relationships, periods, etc. The situation is roughly as in figure §. 

We have proven that the translation rules are correct, in the sense that the denotation of 
a top formula (roughly speaking its answer) as defined by the semantics of top is the same 
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TOP formula 



path 1 




universe of world entities, 
relationships, etc. 



path 2 
TSQL2 query 



semantics of 
database constructs 



database 
constructs 



evaluation of 
TSQL2 query 



Figure 2: From TOP to world entities 



as the denotation (answer) of the corresponding TSQL2 query when determined by way of 
the "evaluation" function and the database semantics. In terms of figure ^ above, the same 
semantic content is reached whether path 1 or path 2 is chosen. 



7 Comparison to previous work 

Although there has been a significant body of work on tense and aspect theories, temporal 
logics, and temporal database, only a few researchers have attempted to apply this work on 
nldbs, most notably Clifford [0, [0], H, De et al. @, ||, and Moens gO|, gf|, ||. 

Clifford's work constitutes the most significant previous approach to nltdbs. Clifford 
defines formally a temporal version of the relational database model, and a fragment of 
English that can be used to query databases structured according to his model. Following the 
Montague tradition [26], Clifford employs an intensional higher-order language (called IL S ) 
to represent the semantics of the English questions, and a temporally enhanced version of 
Montague's ptq grammar to map from English to il s . Clifford's coverage of English, however, 
is extremely narrow, and the semantics of the English temporal mechanisms are oversimplified. 
For example, perfect and continuous tenses are not supported, and no aspectual taxonomy 



(section ||) is employed. Clifford also defines a version of relational algebra [57] for his database 
model (this can be thought of as a theoretical database query language), and sketches a 
method to translate from IL S to that algebra. The description of this mapping, however, is 
very informal, and with no proof of its correctness. There is also no indication that Clifford's 
theory was ever used to implement an actual nltdb. 

De et al. describe a question-answering system that can handle a fragment of English 
questions involving time. The "temporal database" in this case is a rather ad hoc collection 
of facts and inference rules, expressed in a kind of logic-programming language. There is 
no clear intermediate meaning representation language, and it is very difficult to distinguish 
the part of the system that is responsible for the linguistic processing from the part of the 
system that is responsible for retrieving information from the database. De et al. consider 
this an advantage, but it clearly sacrifices modularity and portability (cf. remarks in section 
|6.1| above). For example, it is very hard to see which parts of the software would have to be 
modified if the natural language processor were to be used with a commercial DBMS. De et 
al.'s system does not seem to be based on any clear linguistic analysis. There is also very little 
information in [20| and [21| on exactly which temporal linguistic mechanisms are supported, 
and which semantics are assigned to these mechanisms. 

Moens' work on temporal linguistic phenomena has been highly influential in the area of 
tense and aspect theories. In the last part of Q] (see also [41 1), Moens develops a simplistic 
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NLTDB. This has a very limited linguistic coverage, and is mainly intended to illustrate 
Moens' tense and aspect theory, rather than to constitute a detailed exploration of nltdbs. 
As in the case of De et al., Moens' "database" is a rather idiosyncratic collection of facts and 
inference rules (expressed in Prolog). Moens' system uses a subset of Prolog as its meaning 
representation language. English questions are translated into expressions of this subset 



using a DCG grammar [46|, and there are Prolog rules that directly evaluate the meaning 
representation language expressions against the database. Moens provides essentially no 
information about the DCG grammar. The definition of the meaning representation language 
is also very unclear (e.g. it is difficult to see exactly which Prolog expressions are part of 
the representation language), and the semantics of the language is defined very informally 
(by listing Prolog code that evaluates some of the possible expressions of the representation 
language) . 

We argue in || that all previous approaches to nltdbs suffer from one or more of the 
following: (i) they ignore important English temporal mechanisms, or (ii) they assign to them 
over-simplified semantics (both (i) and (ii) apply to Clifford and De et al.), (iii) they lack 
clearly defined meaning representation languages (De et al. and Moens), (iv) they do not 
provide complete descriptions of the mappings from natural language to meaning representa- 
tion language (De et al., Moens), or (v) from meaning representation language to database 
language (Clifford), (vi) they adopt idiosyncratic (Clifford) and possibly also not well-defined 
database models or languages (De et al., Moens), (vii) they do not demonstrate that their 
ideas are implementable (Clifford). 

Our framework avoids completely pitfalls (iii) - (vi) : our meaning representation language 
(top) is completely and formally defined (see appendix [A], and also §); the mapping from 
English to top is based on a well-known and widely used grammar framework (hpsg), and 
all our modifications to it are well-documented (in Q ) ; we have adopted a database language 
(TSQL2) and its underlying database model that the temporal database community is con- 
sidering as a basis for a standard; and we have developed a formally defined (and provably 
correct) method to translate from TOP to TSQL2. 

Our proposal is open to criticism on (i), in the sense that several important temporal 
linguistic mechanisms remain to be added to our linguistic coverage. The temporal mecha- 
nisms that our framework supports, however, are assigned sufficiently elaborate semantics, 
and therefore our framework does not suffer from (ii). Finally, our framework is partially 
open to criticism on point (vii), in the sense that our prototype nltdb has not been linked 
to an actual DBMS. We are confident that our framework is sufficiently well-documented and 
concretely founded, to the extent that future work will be able to improve it with regard to 
points (i) and (vii). 

8 Conclusions and further work 

We have presented a theoretical framework for building nltdbs, and a prototype NLTDB that 
proves the implement ability of that framework. We have also argued that our framework 
constitutes an improvement over previous work on nltdbs. We discuss below some possible 
extensions to our work. 

Extending the linguistic coverage: We have already pointed out the need to extend the 
linguistic coverage of our framework. Among the temporal linguistic mechanisms that we have 
not considered (section |4.9| ), temporal anaphoric phenomena (discussed briefly in section 4.3) 
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seem to be particularly interesting: several researchers have examined temporal anaphora 
(e.g. p^l , [j3l| , fjijl , [45 1, |6l[| ), and it would be interesting to explore the applicability of 



their proposals to nltdbs. A Wizard of Oz experiment could also be performed to determine 
which temporal phenomena most urgently need to be added to the linguistic coverage of our 
framework, and to collect sample questions that could be used as a test suite for nltdbs [|37j . 
(In a Wizard of Oz experiment, users interact through terminals with a person that pretends 
to be a natural language interface; see p3|.) 



Cooperative responses: In |2[ we reach the conclusion that cooperative responses [35 are 
particularly important in nltdbs, and that a mechanism to generate such responses should 
be added to our framework. For example, ( |140| ) is currently mapped to a top formula that 
requires BA737 to have reached gate 2 for the answer to be affirmative. This causes a negative 
response to be generated if BA737 was taxiing to gate 2 but never reached it. While a simple 
negative response is strictly speaking correct, it is hardly satisfactory in this case. A more 
cooperative response like ( |141 ) is needed. 



(140) Did BA737 taxi to gate 2? 

(141) BA737 was taxiing to gate 2, but never reached it. 



As noted by |3jJ and others, cooperative responses can also be a useful enhancement to a 
nldb's capabilities even where the reply is positive: 

(142) Did BA737 taxi to gate 2? 

(143) BA737 taxied to gate 2 at 9:05am on 10/11/94 



Linking to a DBMS: We have already discussed the need to link the prototype nltdb to 
a TSQL2 DBMS (timedb being currently the only option we know of). 

Extending the prototype NLTDB: As already noted in section [l], several components 
would have to be added to the prototype nltdb, if it were to be used in real-life applications 
(e.g. a lexical preprocessor, a module for paraphrasing the user's questions, a module for 
quantifier-scoping, a reasoning component, configuration tools, etc.; see [Q] for discussion on 
how these would fit into our existing architecture). As most of these components are not 
directly sensitive to temporal issues, it should be possible to implement them by borrowing 
existing techniques from conventional (non-temporal) NLDBs. 

Optimising the TSQL2 code: Although provably correct, the queries that the top to 
TSQL2 translator generates are often verbose. It is usually easy for a TSQL2 programmer to see 
ways to shorten significantly the generated TSQL2 code. Long database language expressions 
can confuse dbmss (especially prototype ones), leading them to unacceptable performance. 
A TSQL2 optimiser is needed that will apply simplifications to the TSQL2 code of the top 
to TSQL2 translator, before the TSQL2 queries are passed to the DBMS (see @] for related 
discussion). 
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A Appendix : The Top Language 

A.l The Syntax of TOP 
Notation 

{ } : Used to group elements in BNF. Not part of TOP language. 
F* : Zero or more repetitions of F. 
F + : One or more repetitions of F. 

: Separates different possible RHSs of BNF rule. 
(n) : Footnote - indicates some further condition on rule. 

Terminal symbols are in lower case, usually with an initial capital. Nonterminals are in upper 
case. 

Grammar 

The distinguished symbol is FORMS. 

FORMS -> YNFORMS | WHFORMS 
WHFORMS -»• WHF0RMS1 \ WHFORMS2 
WHFORMSl -> {? VARS}+ YNFORMS 
WHFORMS2 ^? mxl VARS{? VARS}* YNFORMS 
YNFORMS -» AFORMS 

| YNFORMS A YNFORMS 

| Pres [ YNFORMS] 

| Past[VARS, YNFORMS] 

| P erf [VARS, YNFORMS] 

| At[TERMS, YNFORMS] 

| At[YN FORMS, YNFORMS] 

| Before[TERMS, YNFORMS] 

| Before [ YNFORMS, YNFORMS] 

| After [TERMS, YNFORMS] 

| After [ YNFORMS , YNFORMS] 

| Ntense[VARS, YNFORMS] 

| Ntense[now*, YNFORMS] 

| For[CPARTS, VQTY, YNFORMS] 

| Fills [ YNFORMS] 

| Begin[YNFORMS] 

| End [ YNFORMS] 

| Culm [LITERAL] 

AFORMS -> LITERAL \ P art[P ARTS , VARS] \ P art[P ARTS , VARS ,VORD] 

LITERAL PFUNS({ TERMS, }* TERMS) 

TERMS -> C0iV5 | TMiJS 1 

P^i?T,S -> CPARTS | GPARTS 

VORD^O | -1 | -2 | ... 

VQTY -» 1 | 2 | 3 | . . . 

PFUNS, CPARTS, GPARTS, CONS, VARS are disjoint open classes of terminal symbols. 



(1) 
(2) 



(3) 
(3) 
(4) 

(4) 

(4) 

(3) 
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Footnotes 

(1) : Each VARS terminal which appears in the {? VARS} + expression must occur inside 
the YNFORMS expression. 

(2) : Each VARS terminal which appears in the {? VARS} + expression must occur in- 
side the YNFORMS expression, and the VARS terminal of the 1 mx i VARS expres- 
sion occurs at least once inside the YNFORMS expression as the first argument of 
a Past, Perf, At, Before, After, or Ntense operator, or as the second argument of a 
Part operator. 

(3) : The VARS expression does not occur in the FORMS expression. 

(4) : If the TERMS expression is a VARS, then the VARS expression does not occur in 
the YNFORMS expression. 

A.2 Semantics of TOP 
Ontology 

Point structure: A point structure for top is an ordered pair (PTS, -<), such that PTS is 
a non-empty set, -< is a binary relation over PTS x PTS, and {PTS, -<) has the following five 
properties: 

transitivity: If t±,t2,t 3 G PTS, t\ -< t2, and £2 ~< t 3 , then t\ -< t 3 . 
irreflexivity: If t G PTS, then t < t does not hold. 

linearity: If t\,ti G PTS and t\ / £2, then exactly one of the following holds: t\ -< t2 or 
t 2 -< h. 

left and right boundedness: There is a i/i rs t G PTS, such that for all t G PTS, tfi rs t ^ t. 
Similarly, there is a ti ast G PTS, such that for all t G PTS, t ^ t; ast . 

discreteness: For every t±,t2 G PTS, with t\ ^ t 2 , there is at most a finite number of 
t 3 G PTS, such that t ± -< t 3 -< t 2 . 

The elements of PTS are called time-points. The following functions are defined on time- 
points: 

prev(t) and next(t): If t\ G PTS — {ti as t}, then next{t\) denotes a t2 G PTS, such that 
t\ -< t2 and for no t 3 G PTS is it true that t\ -< t 3 -< t2- Similarly, if t\ G PTS — {tfi rst }, 
then prev(t\) denotes a t2 G PTS, such that t2 -< t\ and for no *3 G PTS is it true that 
t 2 ~< t 3 -< t\. Here, whenever next(t) is used, it is assumed that t ^ ti ast . Similarly, whenever 
prev(t) is used, it is assumed that t / t first- 
Periods and instantaneous periods: A period p over (PTS, -<) is a non-empty subset of 
PTS with the following property: 

convexity: If <i, <2 G p, £3 G and t\ -< t 3 -< t2, then t 3 G p. 
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PERIODS (pts -<) is the set of all periods over (PTS, -<). If p G PERIODS /pts ,-<) and p con- 
tains only one time-point, then p is an instantaneous period over {PTS, -<). INSTANTS (pts ,-<) 
is the set of all instantaneous periods over {PTS, -<). For simplicity, we may write PERIODS 
and INSTANTS instead of PERIODS {PTS ^) and INSTANTS {PTS ^)- PERIODS* PTS<) (or 
simply PERIODS*) is the set PERIODS U {0}. 

Subperiods: j>i is a subperiod of p2j iff Pi,P2 G PERIODS and pi C p2- In this case 
we write pi C p2- (pi £>2 is weaker than pi C y>2 5 because it does not guarantee that 
Pi>P2 G PERIODS .) Similarly, pi is a proper subperiod of j?2j iff Pi,P2 G PERIODS and 
pi C P2 • In this case we write pi C P2 ■ 

Maximal periods: If S" is a set of periods, then mxlpers(S) is the set of maximal periods of 
5\ mxlpers(S) = {p £ S \ for no p' 6 5 is it true that p c f/}. 

minpt(S) and maxpt(S): If S C PTS, minpt(S) denotes the time-point t G S", such that 
for every t' € S, t -< t' . Similarly, if S C PTS, maxpt(S) denotes the time-point t G S, such 
that for every t' £ S, t' <t. 

Notation: Following standard conventions, [ti , ^2] denotes the set {t G PTS \ t\ -< t -< £2}- 
Similarly, (t\,t2\ denotes the set {t G PTS | t% -< t H £2}- [^1^2) and (ii,i2) are defined 
similarly. 



Model structure 

A top model M is an ordered 7-tuple: 

M = {{P TS , -< ) , OBJS , fcons ; fpfuns > f culms j fgparts j fcparts ) 



such that {PTS,^,) is a point structure for top (section |AT2j ), PERIODS (pts,^) ^ OBJS, 
and /cons, /pjwjs, /cuims, / 5pa rts, and /cparis are as specified below. 

OBJS: OBJS is a set containing all the objects in the modelled world that can be denoted 
by TOP terms. 

fcons : fcons is a function CONS 1— > OBJS, i.e. / cons maps each top constant to a world object. 

fpfuns : /p/uns is a function that maps each pair {it, n), where it G PFUNS and n£ {1,2,3,... }, 
to another function (OBJS) n 1— > pow(PERIODS) (where pou> (S) denotes the powerset of S). 

For every it G PFUNS and n£ {1,2,3,... }, fpf uns (TT, n) must have the following property: 
for every (01,02, . . . , o n ) G (OBJS) n , it must be the case that: 

if Pi,P2 G fpfunsin, n)(oi, o 2 , ■ ■ ■ , o n ) and pi U p 2 G PERIODS, then pi = p 2 

Intuitively, / p / uns shows the maximal periods where the situation represented by tt{t\, . . . , r n ) 
holds, for every combination of arguments T\, . . . , r n . 

fcuims^ f culms is a function that maps each pair {it, n), where it G PFUNS and n G {1, 2, 3, . . . }, 
to another function (OBJS) n 1— » {T, i 7 }. Intuitively, / cu ims shows whether or not a situation 
reaches a climax at the latest time-point where the situation is ongoing. 

fgparts : fgparts is a function that maps each element of GPARTS to a gappy partitioning. A 
gappy partitioning is a subset S of PERIODS , such that for every p±,p2 G S, p\T\p2 = 0, 
and \J peS p^PTS. 
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fcparts : fcparts is a function that maps each element of CPARTS to a complete partitioning. A 
complete partitioning is a subset S of PERIODS ', such that for every pi,p 2 £ 5*, pi np 2 = 0, 
and U p6 <?P = - PT5 '- 

Variable assignment 

A variable assignment w.r.t. a top model M is a function <? : VARS » OBJS (g assigns 
to each variable an element of OBJS). Gm, or simply G, is the set of all possible variable 
assignments w.r.t. M. 

If 5 € G, (3 G V^l-PiS 1 , and o € OBJS, then is the variable assignment defined as follows: 
= o, and for every /?' G VARS with /?' ^ /3, g%(f3') = g(fi). 



Denotation of a TOP expression 

In the following, we shall use the non-terminal symbols of the top syntax given in section 
A.l above as the names of sets containing expressions which can be analysed syntactically 
as the corresponding non-terminal. That is, where we stipulate, for example, that "</> G 
YNFORMS" , this means that "0 is a top expression which syntactically is a YNFORMS" . 
Also, we will assume that in the semantic definitions for complex expressions, all the various 



sub-expressions are of the correct syntactic types, as given by the grammar in section A.l 
above. 

An index of evaluation is an ordered 3-tuple (st,et,lt), such that st G PTS, et G 
PERIODS, and It G PERIODS*. 

The denotation of a TOP expression £ w.r.t. a model M, an index of evaluation (st, et, It), 
and a variable assignment g, is written ||£||' M '> s *> e *> It >fl or simply ||£|| s *> e *>">5. "When the denota- 
tion of £ does not depend on st, et, and It, we may write U\\ M ' 9 or simply 

• If re G CONS, then \\k\\ 9 = f cons {n). 

• If p G VARS, then = t/(/3). 

• If <f> G YNFORMS, then ||0||^>«.s G {T,F}. 

• If 7t(ti,t 2 , . . . ,r n ) G LITERAL, then ||^(ti,t 2 , . . . ,r n )|| s *' e *'^ = T iff et C ft and for 
some p mxi G/p/ uns (7r,n)(||ri||9,||r 2 || 9 ,...,||r n ||f), et Qp mx i. 

• If 4>i,4>2 £ YNFORMS, then ||0i A^ 2 || st ' et ' Z *' 9 = Tiff ||^i|| st ' et ' Zi '» = T and ||0 2 || s *' eiA9 = 
T. 

• ||Pari [<r, /?, v rd\ II s is P, iff all the following hold (below / = f cpar ts if <r G CPARTS, and 
/ = fgparts if a G G PARTS): 

g(P) e /(*), 

- if z; or . d = 0, then st G 

— if v or d < — 1, then the following set contains exactly —v rd — 1 elements: 

{p G /(c) | maxpt(g((3)) -< minpt(p) and maxpt(p) -< si} 

. ||Part[a,/3]||s = P iff 5 (/3) G /(a) (where / = / cparis if a G CPARTS, and / = / 3paris if 
<r G GPARTS). 
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• \\Pres[cl)]\\ st ' et ' lt >9 = T, iff st G et and ||0|| s *> e *As = T. 

• ||Past[/3,c/>]|| s *' ei '^ = T, iff = et and \\^ft,et,ltn[t first ,st),g = T 

. || Culm[TT(n, . . . ,T n )]\\ st ' et > lt '9 = T, iff et C it, / Ctt / ms (vr, n)(||ri|| 9 , • • • , \\T n \\ 9 ) = T, S + 0, 
and et = [mmpt(S'), 7710x^(6')], where: 

S= |J p 

PG/ p/tl „ s (7r,n)(||n|| 9 ,---,||-rn|| 9 ) 

• \\At[T,(f)}\\ st ^ lt ' 9 = T, iff ||r|| 9 € PERIODS and ||^|| s *> e ^* n IMI 9 >9 = r _ 

• ||i4i[^i,^ 2 ]|| 8 *' ct ' B > fl = T, iff for some et', 

et' G mxipers({e G PERIODS | \\<t>i\\ st > e > PTS > 9 = T}) and ||^ 2 ||^,et,/tnet', g = T> 

• ||5e/ore[r,0]|| st ' e *'"'» = T, iff ||r||» G PERIODS and ||^||^,et^n[t /ir3t ,mmpt(||r||»)),g = T> 

• \\Beforey> 1 ,^ 2 ]\\ st,et,lt ' 9 = T, iff for some et', 

et' G mxipers({e G PERIODS \ ||0i \\ st > e > PTS >9 = T}), and ||0 2 ||^,et,«tn[t /irst ,mm P t(et')),9 = 
T. 

• P/ter[r,(/)]|| s *' e *'^ = T, iff ||r||» G PERIODS and ||^||*t.et,Jtn(maipt(||r||9),t jMt ], fl = T> 

• ||A/ter[0i,0 2 ]|| st ' et ' Z '' 9 = T, iff for some et', et' G mxlpers({e \ \\c/)i\\ st > e > PTS ' 9 = T}) and 
||^ 2 ||st,et,Jtn(maxpt(et'),t Jost ],g _ j> 

• ||Fiiis[0]|| st ' e *'^ = T, iff et = it and = T . 

• \\Begin[(f>]\\ st ' et ' lt ' 9 = T, iff et C it and for some et' 

et' G mxipers({e G PERIODS | ||0||*WTS,p = T j) and et = { m i np t( e t')} . 

• \\End[<j>]\\ st > et ' lt > 9 = T, iff et C it and for some et' 

et' G mxipers({e G PERIODS \ ||</,|| S *^ PT -S, 9 = T}) and et = {maxpt(et')}. 

• \\Ntense\p, <j>\ \\ st > et < lt > 9 = T, iff for some et' G PERIODS, g{(3) = et' and W^et' ,pts, 9 = 
T. 

• ||Mense[nou>*,c/>]|| s *' e *'^ = T, iff ||^|| st .{^}>^5, 5 = T 

• ||F r[(T c , 0]||»*.et.".» = T, iff ||0||sW,<? = T , and for somepi,p 2 , . . . ,p Uqty G / cpa r is (^ c ), 
it is true that minpt{p\) = minpt(et), next{maxpt{p\)) = minpt{p2), next(maxpt(p2)) = 
minpt(pz), . . . , next{maxpt{p Uqty ^i)) = rninpt(p Uqty ), and maxpt(p Uqty ) = maxpt(et). 

• \\Perf[j3,(t)]\\ st > et > lt < 9 = T, iff et C it, and for some et' G PERIODS, it is true that 

= et', maxptiet') -< minpt(et), and ||0|| st ' et > PT,S >9 = T. 

. ||?/?! ?ft . . . ?/3 n 0||^.3 = {<</(/?!), . . .,<?(/?„)} I = T} 

• W?mxiPi ?ft ?/% ... ?/?n <l>\\ st ^ u > 9 = {{g(l3 1 ),g(02),g(0 3 ),...,9(l3 n )) | \\cf>\\^> lt > 9 = 
T,and for no et' G PERIODS and g' G G is it true that ||0||^et',«t, g ' = r g ^ c 
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The denotation of <fi w.r.t. M,st, written ||0|| ,st or simply ||</>|| s *, is denned only for TOP 
formulae: 

• If (j) € YNFORMS, then \\<p\\ st = 

- T, if for some g G G and et € PERIODS, ||0||^et,PT5, 9 = T) 

- F, otherwise 

. If e WHFORMS, then = U 9eG , etePWODS IH| st ' et ' PT5 ' ff - 

Within the NLTDB, each yes/no question (e.g. "Is tank 2 empty?") is mapped to a (f> £ 
YNFORMS (multiple formulae are generated for ambiguous questions). The answer is af- 
firmative iff \\(f)\\ st = T. With questions containing wh- words (e.g. "Who repaired what?"), 
the question is mapped to a (j) G WHFORMS (again multiple formulae are generated if the 
question is ambiguous), and the answer reports the tuples in \\(f)\\ st . 
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